Pytorch學習筆記06---- torch.nn.Embedding 詞嵌入層的理解

本文轉載自查看原文 2020-07-25 18:12 2445 Pytorch自然語言處理

1.word Embedding的概念理解

首先，我們先理解一下什么是Embedding。Word Embedding翻譯過來的意思就是詞嵌入，通俗來講就是將文字轉換為一串數字。因為數字是計算機更容易識別的一種表達形式。我們詞嵌入的過程，就相當於是我們在給計算機制造出一本字典的過程。計算機可以通過這個字典來間接地識別文字。詞嵌入向量的意思也可以理解成：詞在神經網絡中的向量表示。

2.Pytorch中的Embedding

官方文檔的定義：

A simple lookup table that stores embeddings of a fixed dictionary and size.
This module is often used to store word embeddings and retrieve them using indices.
The input to the module is a list of indices, and the output is the corresponding word embeddings.

一個簡單的存儲固定大小的詞典的嵌入向量的查找表，意思就是說，給一個編號，嵌入層就能返回這個編號對應的嵌入向量，嵌入向量反映了各個編號代表的符號之間的語義關系。該模塊通常用於存儲單詞嵌入並使用索引檢索它們。

模塊的輸入是索引列表，輸出是相應的詞嵌入。

官方文檔參數說明：

def __init__(self, num_embeddings, embedding_dim, padding_idx=None,
                 max_norm=None, norm_type=2., scale_grad_by_freq=False,
                 sparse=False, _weight=None)

Args:
        num_embeddings (int): size of the dictionary of embeddings
        embedding_dim (int): the size of each embedding vector
        padding_idx (int, optional): If given, pads the output with the embedding vector at :attr:`padding_idx`
                                         (initialized to zeros) whenever it encounters the index.
        max_norm (float, optional): If given, each embedding vector with norm larger than :attr:`max_norm`
                                    is renormalized to have norm :attr:`max_norm`.
        norm_type (float, optional): The p of the p-norm to compute for the :attr:`max_norm` option. Default ``2``.
        scale_grad_by_freq (boolean, optional): If given, this will scale gradients by the inverse of frequency of
                                                the words in the mini-batch. Default ``False``.
        sparse (bool, optional): If ``True``, gradient w.r.t. :attr:`weight` matrix will be a sparse tensor.
                                 See Notes for more details regarding sparse gradients.

參數理解說明：

num_embeddings (python:int) – 詞典的大小尺寸，即一個詞典里要有多少個詞，比如總共出現5000個詞，那就輸入5000。此時index為（0-4999）
embedding_dim (python:int) – 嵌入向量的維度，即用多少維來表示一個符號。
padding_idx (python:int, optional) – 填充id，比如，輸入長度為100，但是每次的句子長度並不一樣，后面就需要用統一的數字填充，而這里就是指定這個數字，這樣，網絡在遇到填充id時，就不會計算其與其它符號的相關性。（初始化為0）
max_norm (python:float, optional) – 最大范數，如果嵌入向量的范數超過了這個界限，就要進行再歸一化。
norm_type (python:float, optional) – 指定利用什么范數計算，並用於對比max_norm，默認為2范數。
scale_grad_by_freq (boolean, optional) – 根據單詞在mini-batch中出現的頻率，對梯度進行放縮。默認為False.
sparse (bool, optional) – 若為True,則與權重矩陣相關的梯度轉變為稀疏張量

輸入： LongTensor (N, W), N = mini-batch, W = 每個mini-batch中提取的下標數
輸出： (N, W, embedding_dim)

這個語句是創建一個詞嵌入模型，num_embeddings代表一共有多少個詞，embedding_dim代表你想要為每個詞創建一個多少維的向量來表示它

案例解釋：

import torch
from torch import nn

embedding = nn.Embedding(5, 4) # 假定字典中只有5個詞，詞向量維度為4
word = [[1, 2, 3],
        [2, 3, 4]] # 每個數字代表一個詞，例如 {'!':0,'how':1, 'are':2, 'you':3,  'ok':4}
                    #而且這些數字的范圍只能在0～4之間，因為上面定義了只有5個詞
embed = embedding(torch.LongTensor(word))
print(embed) 
print(embed.size())

輸出：

tensor([[[-0.4093, -1.0110,  0.6731,  0.0790],
         [-0.6557, -0.9846, -0.1647,  2.2633], [-0.5706, -1.1936, -0.2704, 0.0708]], [[-0.6557, -0.9846, -0.1647, 2.2633], [-0.5706, -1.1936, -0.2704, 0.0708], [ 0.2242, -0.5989, 0.4237, 2.2405]]], grad_fn=<EmbeddingBackward>) torch.Size([2, 3, 4])

embed輸出的維度是[2, 3, 4]，這就代表對於輸入的[2,3]維的詞，每一個詞都被映射成了一個4維的向量。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pytorch中，嵌入層torch.nn.embedding的計算方式 torch.nn.Embedding理解 torch.nn.Embedding torch.nn.Embedding pytorch實現word embedding: torch.nn.Embedding torch.nn.Embedding使用 torch.nn.Embedding進行word Embedding 《python深度學習》筆記---6.1-2、word embedding-利用 Embedding 層學習詞嵌入嵌入(embedding)層的理解【python學習筆記】pytorch中的nn.Embedding用法