FastText.Zip: Compressing Text Classification Models

This paper experiments with compression methods for fastText embeddings. It notes that Product Quantization (PQ) is effective, and gives a lucid overview of the process: split the vector into sub-vector parts of lower dimension, create a representative set of centroids within the low-dim spaces, and compress the sub-vectors by choosing the nearest centroid.

The paper notes that the norm (length) of the embeddings varies by a factor of 1000, and the ones with higher magnitudes are more salient for text discrimination.

The norm length can be explicitly encoded with an extra byte.

Another saving is using hashing only, without text indices for the dictionary.