This explains the difficulty in training VAEs with complex conditional models: the latent-dependent model will converge on an unconditioned distribution early in training, and never recover any dependence on the latent variable. The relative contribution of the latent variable in terms of affecting the loss is described in terms of its information content in bits-back-coding compression.

see also ruishu’s gaussian mixture VAE post