Adam: A Method For Stochastic Optimization

An alternative to AdaGrad and RMSProp, it maintains exp-weighted running estimates of the 1st and 2nd moments of the gradient.

see it compared with other optimizers