or: Lossless Data Compression with Neural Networks

Fabrice Bellard’s neural text compressor:

  • Preprocess words and stems into vocab tokens.
  • Train an LSTM-RNN or autoregressive transformer to predict the next word token.
  • Use arithmetic coding based on the prediction vector.

Near-SOTA for enwik8/enwik9 text compression.
Weirdly, NNCP is set up to learn its weights from scratch, and compress in a single pass.