Layer Normalization

BatchNorm takes all the inputs to a neuron over a batch and shifts and scales the inputs to conform to a set mean and variance.

LayerNorm takes all the inputs to all neurons in a layer, and shifts and scales them to conform to a set mean and variance. It works well for recurrent networks and transformers, but poorly on CNNs.