LayerNorm
Layer Normalization
BatchNorm takes all the inputs to a neuron over a batch and shifts and scales the inputs to conform to a set mean and variance.
LayerNorm takes all the inputs to all neurons in a layer, and shifts and scales them to conform to a set mean and variance. It works well for recurrent networks and transformers, but poorly on CNNs.