AudioLM
AudioLM: a Language Modeling Approach to Audio Generation
The authors use a pretrained neural audio codec which translates between audio waveforms and high-fidelity audio tokens, and use these tokens to feed a language model along with word-level/semantic tokens.