Adadelta: An Adaptive Learning Rate Method

A further development based on AdaGrad, but with the learning rate hyperparam removed, and continuing to learn even after large per-parameter gradient updates.

optimization comparison