MEGA
MEGA: Moving Average Equipped Gated Attention
A mixture of GRU with exponential moving average, Gated Attention Unit, and S4/GSS.
Uses single-headed, linear attention.
SOTA on long range sequences, competitive at text tasks.
from a laptop in Sunnyvale
MEGA: Moving Average Equipped Gated Attention
A mixture of GRU with exponential moving average, Gated Attention Unit, and S4/GSS.
Uses single-headed, linear attention.
SOTA on long range sequences, competitive at text tasks.