fasttext原理

本文轉載自查看原文 2017-07-22 10:06 1532 machine learning

模型的優化目標如下：

其中，$<x_n,y_n>$是一條訓練樣本，$y_n$是訓練目標，$x_n$是normalized bag of features。矩陣參數A是基於word的look-up table，也就是A是詞的embedding向量。$Ax_n$矩陣運算的數學意義是將word的embedding向量找到后相加或者取平均，得到hidden向量。矩陣參數B是函數f的參數，函數f是一個多分類問題，所以$f(BAx_n)$是一個多分類的線性函數。優化目標是使的這個多分類問題的似然越大越好。

將優化目標表示為圖模型如下：

與Word2Vec的區別：

相似的地方：

圖模型結構很像，都是采用embedding向量的形式，得到word的隱向量表達。
都采用很多相似的優化方法，比如使用Hierarchical softmax優化訓練和預測中的打分速度。

不同的地方：

word2vec是一個無監督算法，而fasttext是一個有監督算法。word2vec的學習目標是skip的word，而fasttext的學習目標是人工標注的分類結果。

word2vec treats each word in corpus like an atomic entity and generates a vector for each word（word2vec中每個Word對應一個詞向量，fasttext中每個Word可以產生多個character字符ngrams，每個ngram對應一個詞向量，word的詞向量是所有ngrams的詞向量的和，需要指定ngrams的長度范圍）. Fasttext (which is essentially an extension of word2vec model), treats each word as composed of character ngrams. So the vector for a word is made of the sum of this character n grams. For example the word vector “apple” is a sum of the vectors of the n-grams “<ap”, “app”, ”appl”, ”apple”, ”apple>”, “ppl”, “pple”, ”pple>”, “ple”, ”ple>”, ”le>” (assuming hyperparameters for smallest ngram[minn] is 3 and largest ngram[maxn] is 6). This difference manifests as follows.

Generate better word embeddings for rare words ( even if words are rare their character n grams are still shared with other words - hence the embeddings can still be good).
Out of vocabulary words（即使不在訓練集語料中的Word也能得到詞向量） - they can construct the vector for a word from its character n grams even if word doesn't appear in training corpus.
From a practical usage standpoint, the choice of hyperparamters for generating fasttext embeddings becomes key：since the training is at character n-gram level, it takes longer to generate fasttext embeddings compared to word2vec - the choice of hyper parameters controlling the minimum and maximum n-gram sizes has a direct bearing on this time.
The usage of character embeddings (individual characters as opposed to n-grams) for downstream tasks have recently shown to boost the performance of those tasks compared to using word embeddings like word2vec or Glove.

https://heleifz.github.io/14732610572844.html

https://arxiv.org/pdf/1607.04606v1.pdf

http://www.jianshu.com/p/b7ede4e842f1

https://www.quora.com/What-is-the-main-difference-between-word2vec-and-fastText

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Fasttext原理 FastText原理介紹 FastText算法原理解析 fastText fasttext安裝 FastText 介紹 fasttext(1) -- 認識 fasttext 和初步使用 FastText 分析與實踐 fasttext使用筆記 fasttext 和pysparnn的安裝