or: Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning

The authors adapt a diffusion model to predict categorical or discrete numeric data. For numeric data they autoregressively predict a series of bits, from large to small.

They also substitute a discrete diffusion model for a transformer decoder in an image captioning model. By predicting a series of bits they can produce an output of dimension log2Klog_2 K to predict word tokens, instead of a K-sized 1-hot vector where K is the vocab size.