or: Word Alignment by Fine-tuning Embeddings on Parallel Corpora

A method for aligning tokens between parallel texts in two languages. Start with pretrained monolingual word vectors, then train a model with several tasks (picking out masked words, translation and back-translation consistency, etc.)

Note: this is about word-to-word alignment for bilingual text, not alignment of word vectors in a shared bilingual embedding!