LambdaNetworks: Modeling Long-Range Interactions Without Attention

A fast, low-parameter alternative to linear attention, which can take advantage of fft convolution.