UL2R
or: Transcending Scaling Laws with 0.1% Extra Compute
Briefly retraining a huge causal model on bidirectional tasks improves generalization.
In this case, the authors retrained PaLM to create UPaLM.
from a laptop in Sunnyvale
or: Transcending Scaling Laws with 0.1% Extra Compute
Briefly retraining a huge causal model on bidirectional tasks improves generalization.
In this case, the authors retrained PaLM to create UPaLM.