Demon: Improved Neural Network Training with Momentum Decay

A schedule for momentum decay which greatly improves convergence rate on various network architectures, for SGD-M and Adam optimizers.

comparison