BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

An encoder-only transformer, like a bidirectional version of GPT