Efficient-VDVAE: Less Is More

A very deep VAE model for images performs around SOTA level on many image benchmarks. Using narrow layer widths helps reduce memory and compute, but it’s still very compute-intensive. smoothing gradients across batches allows more stable training with small batch sizes.