Charformer: Fast Character Transformers Via Gradient-Based Subword Tokenization

A BERT-style language model handles tokens generated by a character-to-tokens transformer encoder.