An Overview Of Gradient Descent Optimization Algorithms

A good overview of common optimization methods like SGD, SGD-M, Adam, Nadam, etc.
also available at https://ruder.io/optimizing-gradient-descent/

see also my brief comparison