UL2R
or: Transcending Scaling Laws with 0.1% Extra Compute
Briefly retraining a huge causal model on bidirectional tasks improves generalization.
In this case, the authors retrained PaLM to create UPaLM.

from a laptop in Sunnyvale
or: Transcending Scaling Laws with 0.1% Extra Compute
Briefly retraining a huge causal model on bidirectional tasks improves generalization.
In this case, the authors retrained PaLM to create UPaLM.