word2vec Explained: Deriving Mikolov et al.’s Negative-Sampling Word-Embedding Method

What’s going on with that negative sampling loss? It’s the likelihood that the word-context pair came from a real corpus, and not a noisy nonsense distribution.