Self-training with Noisy Student improves ImageNet classification

An EfficientNet model is trained on ImageNet. A larger student network is trained on unlabeled data, assigned a pseudo-label by the teacher network with high confidence. Noisy augmentations are added to the example sent to the student.
The student model next becomes a teacher for another, larger student model, with more aggressive noising. The distillation process is repeated several times.
SOTA on ImageNet.