Embedding模塊 from_pretrained 加載預訓練好的詞向量

本文轉載自查看原文 2020-12-09 16:40 2596

Embedding 模塊作用：將詞的索引轉化為詞對應的詞向量，需要我們設置的兩個參數：詞匯表的大小和詞嵌入的維度。

num_embeddings (int): size of the dictionary of embeddings
embedding_dim (int): the size of each embedding vector

>>> # an Embedding module containing 10 tensors of size 3
#詞匯表里有10個詞，每個詞向量嵌入維度是3

>>> embedding = nn.Embedding(10, 3)

注意：輸入不是單詞，而是要處理成單詞對應的索引，我們用個字典來存儲單詞到索引的字典

word_to_idx={}
word_to_idx={word:i for i,word in enumerate(vocab)}

再把索引封裝成向量形式

hello_to_tensor = torch.tensor([word_to_ix["hello"]], dtype=torch.long)

就可以送進上面定義好的embedding了，

hello_embed = embedding(hello_to_tensor)

from_pretrained 加載預訓練好的詞向量

　　我們在進行具體nlp任務時，一般通過對應的Embedding層做詞向量的處理，再拿詞向量去進行下游的處理，比如分類啥的，但我們可以使用預訓練好的詞向量，比如使用gensim訓練好的word2vec詞向量，會帶來更優的性能。有一點需要注意的是，當我們將genism已經訓練好的詞向量作為自己初始化的詞向量，我們可以設置詞向量是否還有隨下游任務進行變動這個參數默認是將詞向量凍結住。

>>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])
>>> embedding = nn.Embedding.from_pretrained(weight)
>>> # Get embeddings for index 1
>>> input = torch.LongTensor([1])
>>> embedding(input)

 tensor([[ 4.0000,  5.1000,  6.3000]])

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 各種預訓練的詞向量(Pretrained Word Embeddings) tensorflow如何正確加載預訓練詞向量 paddlepaddle如何預加載embedding向量 word2vec訓練好的詞向量在Keras模型中one-hot編碼,Embedding層,使用預訓練的詞向量/處理圖片【騰訊詞向量】騰訊中文預訓練詞向量 pytorch中如何使用預訓練詞向量英文詞向量：使用fastText預訓練的詞向量 PyTorch在NLP任務中使用預訓練詞向量 PyTorch在NLP任務中使用預訓練詞向量