L0-Normed Sparse Networks
or: Learning Sparse Neural Networks Through L0 Regularization
An alternate scheme for encouraging model sparsity instead of weight decay. Weight decay on large models biases all params downwards, whereas this setup smoothly encourages zeroes but leaves most nonzero params alone.