Low-Resource mBERT
or: Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation
mBert training on low-resource languages can be bootstrapped by using a dictionary to create mixed pseudo-translations in the target language, and have the network back-translate to original sentences in the reference language. Dictionaries can be synthesized or expanded based on limited parallel text like Bibles.