Notes from Notes on Noise Contrastive Estimation and Negative Sampling
one sample:
where \(y_i^0\) are true labeled words , and \(y_i^1,\cdots,y_i^{k}\) are noise samples word index, which is generated by unigram distribution \(q(w)\) of the dataset.
the probability of true data:
the noise sample probability:
the cost function of this sample:
the overall cost function of the dataset:
Related Paper
[Noise-Contrastive Estimation of Unnormalized Statistical Models with Applications to Natural Image Statistics]
[Word2vec Parameter Learning Explained]
[Efficient Estimation of Word Representation in Vector Space]
[Distributed Representations of Words and Phrases and their Compositionality]
[Notes on Noise Contrastive Estimation and Negative Sampling]