Stochastic Depth
or: Deep Networks with Stochastic Depth
When training a deep residual network like resnet or transformer, randomly bypass some layers in each batch. Speeds up convergence and allows deeper networks to be trained.
from a laptop in Sunnyvale
or: Deep Networks with Stochastic Depth
When training a deep residual network like resnet or transformer, randomly bypass some layers in each batch. Speeds up convergence and allows deeper networks to be trained.