or: Sharpness-Aware Minimization For Efficiently Improving Generalization

A new gradient descent optimizer is introduced. It calculates some extra gradients per iteration, and generalizes better than simpler SGD methods. It attempts to seek flatter areas along with minimizing the loss.