LSTM
Peephole LSTM
Full ‘Vanilla’ LSTM, used by Alex Graves for sequence learning.
long-term cell state is mediated by input gate , output gate , and forget gate to produce the short-term hidden state
Basic LSTM
the gates don’t take the cell state into account when updating.
*Jozefowicz et al version uses a tanh nonlinearity on the output gate, making it capable of inverting the cell state. But they find little benefit to including an output gate at all..
LSTM-o
Jozefowicz and friends found the output gate to provide little benefit
Tied Gates LSTM
“Out with the old” implies “in with the new”.
The input gate should be active whenever the forget gate is inactive.
Minimal LSTM
Tied input and forget gates, and no output gate.
GRU
Fewer parameters and fewer tanh
ops than LSTMs, with competitive performance.
Uses update gate and reset gate
MUT1
Architecture search yielded this mutant recurrent cell.
MGU
Minimal Gated Unit.
Like GRU with the z and r functions combined.
SCRN
Structurally-Constrained Recurrent Network
Like a simple recurrent network, but with an added slow hidden neuron with an exp. moving average over its inputs, using a fixed memory parameter which might be set to .95.
where the output is given by
SRN
Simple Recurrent Network
An RNN with poor ability to retain information over many timesteps
Multiplicative Recurrent Units
MI-RNN-s
Simple version
MI-RNN-g
General version
MI-GRU
MI-LSTM
(too much Tex for me, use your imagination…)
mLSTM
SRU
Simple Recurrent Unit
A parallelizable architecture designed for speed.