Squeeze-and-Excitation Networks

Squeeze-and-Excitation (SE) blocks take a spatial block of activations with multiple channels, e.g. 3x3x64.
Average each channel, => 1x1x64
and do a dense ReLu, => 1x1x64
then dense sigmoid activation => 1x1x64.
These per-channel activations are then multiplied by the original block of activations.

The block serves as a data-dependent channel-weighting function.

It seems pretty effective, especially for small networks.