FaceFormer: Speech-Driven 3D Facial Animation with Transformers

From lucidrains:

it looks like you can train a continuous transformer autoregressively with a simple MSE loss