On Linear Identifiability of Learned Representations

As language models get larger, they become more amenable to linear transformations onto other model embeddings. Small models are bad at linear alignment not because they reflect actual linguistic weirdness, but because they tend to be generated from shitty wikipedia ghettoes.