or: ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

A T5 variant is trained on character-level text instead of byte pair encoded/tokenized text.
It works well.