本文將使用LSTM來判別一句話中每一個單詞的詞性。在一句話中,如果我們孤立地看某一個單詞,比如單詞book,而不看book前面的單詞,就不能准確的判斷book在這句話中是動詞還是名詞,但如果我們能記住book前面出現的單詞,那么就能很有把握地判斷book的詞性。LSTM神經網絡就能記住前面的單詞。關於LSTM的詳細介紹,大家可參考文末的參考資料[1][2]。
下面的代碼主要來自文末的參考資料[3],本文對原代碼做了修改並增加了注釋,使其變得更簡單易懂。要理解下面的程序,理解torch.nn.Embedding是關鍵之一,這篇博客將提供幫助。
''' 本程序實現了對單詞詞性的判斷,輸入一句話,輸出該句話中每個單詞的詞性。 ''' import torch import torch.nn.functional as F from torch import nn, optim training_data = [("The dog ate the apple".split(), ["DET", "NN", "V", "DET", "NN"]), ("Everybody read that book".split(), ["NN", "V", "DET", "NN"])] word_to_idx = {} tag_to_idx = {} for context, tag in training_data: for word in context: if word not in word_to_idx: word_to_idx[word] = len(word_to_idx) for label in tag: if label not in tag_to_idx: tag_to_idx[label] = len(tag_to_idx) idx_to_tag = {tag_to_idx[tag]: tag for tag in tag_to_idx} class LSTMTagger(nn.Module): def __init__(self, n_word, n_dim, n_hidden, n_tag): super(LSTMTagger, self).__init__() self.word_embedding = nn.Embedding(n_word, n_dim) self.lstm = nn.LSTM(n_dim, n_hidden, batch_first=True) # nn.lstm()接受的數據輸入是(序列長度,batch,輸入維數), # 這和我們cnn輸入的方式不太一致,所以使用batch_first=True,把輸入變成(batch,序列長度,輸入維度),本程序的序列長度指的是一句話的單詞數目 # 同時,batch_first=True會改變輸出的維度順序。
self.linear1 = nn.Linear(n_hidden, n_tag) def forward(self, x): # x是word_list,即單詞的索引列表,size為len(x) x = self.word_embedding(x) # embedding之后,x的size為(len(x),n_dim) x = x.unsqueeze(0) # unsqueeze之后,x的size為(1,len(x),n_dim),1在下一行程序的lstm中被當做是batchsize,len(x)被當做序列長度 x, _ = self.lstm(x) # lstm的隱藏層輸出,x的size為(1,len(x),n_hidden),因為定義lstm網絡時用了batch_first=True,所以1在第一維,如果batch_first=False,則len(x)會在第一維 x = x.squeeze(0) # squeeze之后,x的size為(len(x),n_hidden),在下一行的linear層中,len(x)被當做是batchsize x = self.linear1(x) # linear層之后,x的size為(len(x),n_tag) y = F.log_softmax(x, dim=1) # 對第1維先進行softmax計算,然后log一下。y的size為(len(x),n_tag)。 return y model = LSTMTagger(len(word_to_idx), 100, 128, len(tag_to_idx)) if torch.cuda.is_available(): model = model.cuda() criterion = nn.NLLLoss() optimizer = optim.SGD(model.parameters(), lr=1e-2) for epoch in range(200): running_loss = 0 for data in training_data: sentence, tags = data word_list = [word_to_idx[word] for word in sentence] # word_list是word索引列表 word_list = torch.LongTensor(word_list) tag_list = [tag_to_idx[tag] for tag in tags] # tag_list是tag索引列表 tag_list = torch.LongTensor(tag_list) if torch.cuda.is_available(): word_list = word_list.cuda() tag_list = tag_list.cuda() # forward out = model(word_list) loss = criterion(out, tag_list) running_loss += loss.data.numpy() # backward optimizer.zero_grad() loss.backward() optimizer.step() print('Epoch: {:<3d} | Loss: {:6.4f}'.format(epoch, running_loss / len(data))) # 模型測試 test_sentence = "Everybody ate the apple" print('\n The test sentence is:\n', test_sentence) test_sentence = test_sentence.split() test_list = [word_to_idx[word] for word in test_sentence] test_list = torch.LongTensor(test_list) if torch.cuda.is_available(): test_list = test_list.cuda() out = model(test_list) _, predict_idx = torch.max(out, 1) # 1表示找行的最大值。 predict_idx是詞性索引,是一個size為([len(test_sentence)]的張量 predict_tag = [idx_to_tag[idx] for idx in list(predict_idx.numpy())] print('The predict tags are:', predict_tag)
參考資料: