Multiplicative Interactions And Where To Find Them

A discussion of multiplicative interactions, like gating, hadamard products, etc.
They claim that hadamard product is generally an improvement over concatenation for combining vectors.
They also introduce an LSTM variant with a fixed-size context vector which outperforms other RNNs but gets blown away by TransformerXL on WikiText perplexity.