or: Neural Machine Translation in Linear Time

ByteNet is an encoder-decoder model based on dilated convolutions. The encoder spans the whole input sequence, and the decoder is causal and autoregressive.