or: Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs

Was shrinking convolution kernel sizes to 3x3 a mistake? What happens when you use modern training and augmentation methods, but with huge 31x31 kernels? Just make the network way shallower and the Flops work out the same.
Gives approximately SOTA results, competitive with giant transformers(!).
ImageNet is pretty much saturated, but this is a big improvement as a backbone for other downstream tasks.