or: Efficiently Modeling Long Sequences With Structured State Spaces

Introduces the S4 model for long range sequence modeling.
SOTA and huge advance over transformers on the Long Range Arena tasks.
On each timestep:

  • Update the state vector according to a learned function of the current state and input sequence;
  • Produce an output which is a learned function of the current state and input sequence.