Funnel Transformer
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
A transformer model which shrinks the sequence length between transformer layers using avg-pooling. It might be useful for summarizing long sequences.
It squeaks out some marginal improvement on benchmarks.