or: On The Importance Of Initialization And Momentum In Deep Learning

Introduces Nesterov’s Accelerated Gradient

see optimizers comparison