Spatial Gating Unit
or: Pay Attention to MLPs
Authors introduce a Spatial Gating Unit (SGU), which gates its activation based on input from a learned spatial pattern shared among all channels.
Kind of like a depth-separable convolution with shared weights for all channels.
An MLP shaped like a transformer network, but using SGU for conditional spatial mixing, is effective on both ImageNet and BERT tasks.
see a simple SGU implementation