Improving Distributional Similarity
or: Improving Distributional Similarity with Lessons Learned from Word Embeddings
A response to the popularity of word2vec embeddings. The authors argue that the ’neural’ nature of word2vec is not critical to its success: Word2Vec, GloVe, SVD, and PPMI-based embeddings are all roughly equivalent for capturing meaningful word context.
More important than the model itelf are particular choices of hyperparameter during training. They experiment with lots of parameter choices including row and column L2 normalization, using both word and context vectors for GLoVe, context size, smoothing, etc.