pytorch實現BiLSTM+CRF用於NER(命名實體識別)

本文轉載自查看原文 2019-07-14 17:22 1433

pytorch實現BiLSTM+CRF用於NER(命名實體識別)
在寫這篇博客之前，我看了網上關於pytorch,BiLstm+CRF的實現，都是一個版本(對pytorch教程的翻譯)，

翻譯得一點質量都沒有，還有一些竟然說做得是詞性標注，B,I,O是詞性標注的tag嗎？真是誤人子弟。所以

自己打算寫一篇關於pytorch上實現命名實體識別的翻譯，加入自己的理解。前面是一些牢騷話

BiLSTM
我上篇博客介紹了pytorch實現LSTM 鏈接,這里是BiLSTM,網絡結構圖如下

單向的LSTM,當前序列元素只能看見前面的元素，而無法看見后面的元素，雙向LSTM克服了這個缺點，既能

看見前面的元素，也能看見后面的元素。學術一點的就是，單向LSTM無法編碼從后往前的信息

注意一點雙向LSTM的輸出O OO是[Oleft O_{left}O
left

,Oright O_{right}O
right

],即size為(2, 隱藏單元數)

CRF
CRF是判別模型，判別公式如下y yy是標記序列，x xx是單詞序列,即已知單詞序列，求最有可能的標記序列
P(y∣x)=exp(Score(x,y))∑y′exp(Score(x,y′)) P(y|x) = \frac{\exp{(\text{Score}(x, y)})}{\sum_{y'} \exp{(\text{Score}(x, y')})}
P(y∣x)=
∑
y
′

exp(Score(x,y
′
))
exp(Score(x,y))

Score(x,y) Score(x,y)Score(x,y)即單詞序列x xx產生標記序列y yy的得分，得分越高，說明其產生的概率越大。

在pytorch教程中鏈接，其用於實體識別定義的Score(x,y) Score(x,y)Score(x,y)包含兩個特征函數，一個是轉移特征函數

一個是狀態特征函數
Score(x,y)=∑ilogψEMIT(yi→xi)+logψTRANS(yi−1→yi) {Score}(x,y) = \sum_i \log \psi_\text{EMIT}(y_i \rightarrow x_i) + \log \psi_\text{TRANS}(y_{i-1} \rightarrow y_i)
Score(x,y)=
i
∑

logψ
EMIT

(y
i

→x
i

)+logψ
TRANS

(y
i−1

→y
i

)

代碼中用到了前向算法和維特比算法，在代碼中我會具體解釋

log_sum_exp函數就是計算log∑ni=1exi log\sum^n_{i=1}{e^{x_{i}}}log∑
i=1
n

e
x
i

,前向算法需要用到這個函數

def log_sum_exp(vec):
max_score = vec[0, argmax(vec)]
max_score_broadcast = max_score.view(1, -1).expand(1, vec.size()[1])
return max_score + \
torch.log(torch.sum(torch.exp(vec - max_score_broadcast)))
1
2
3
4
5
前向算法，求出α \alphaα,即Z(x) Z(x)Z(x), 也就是∑y′exp(Score(x,y′)) {\sum_{y'} \exp{(\text{Score}(x, y')})}∑
y
′

exp(Score(x,y
′
))，如果不懂可以看一下李航的書關於CRF的前向算法

但是不同於李航書的是,代碼中α \alphaα都取了對數，一個是為了運算方便，二個為了后面的最大似然估計。

這個代碼里面沒有進行優化，作者也指出來了，其實對feats的迭代完全沒有必要用兩次循環，其實矩陣相乘

就夠了，作者是為了方便我們理解，所以細化了步驟

def _forward_alg(self, feats):

init_alphas = torch.full((1, self.tagset_size), -10000.)

#初始時,start位置為0，其他位置為-10000
init_alphas[0][self.tag_to_ix[START_TAG]] = 0.

#賦給變量方便后面反向傳播
forward_var = init_alphas

for feat in feats:
alphas_t = []
for next_tag in range(self.tagset_size):
#狀態特征函數的得分
emit_score = feat[next_tag].view(1, -1).expand(1, self.tagset_size)

#狀態轉移函數的得分
trans_score = self.transitions[next_tag].view(1, -1)

#從上一個單詞的每個狀態轉移到next_tag狀態的得分
#所以next_tag_var是一個大小為tag_size的數組
next_tag_var = forward_var + trans_score + emit_score

#對next_tag_var進行log_sum_exp操作
alphas_t.append(log_sum_exp(next_tag_var).view(1))

forward_var = torch.cat(alphas_t).view(1, -1)
terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]]
alpha = log_sum_exp(terminal_var)
return alpha
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
維特比算法中規中矩，可以參考李航書上條件隨機場的預測算法

def _viterbi_decode(self, feats):
backpointers = []
#初始化
init_vvars = torch.full((1, self.tagset_size), -10000.)
init_vvars[0][self.tag_to_ix[START_TAG]] = 0

forward_var = init_vvars
for feat in feats:
#保持路徑節點,用於重構最優路徑
bptrs_t = []
#保持路徑變量概率
viterbivars_t = []

for next_tag in range(self.tagset_size):

next_tag_var = forward_var + self.transitions[next_tag]
best_tag_id = argmax(next_tag_var)
bptrs_t.append(best_tag_id)
viterbivars_t.append(next_tag_var[0][best_tag_id].view(1))

forward_var = (torch.cat(viterbivars_t) + feat).view(1, -1)
backpointers.append(bptrs_t)

#轉移到STOP_TAG
terminal_var = forward_var + self.transitions[self.tag_to_ix[STOP_TAG]]
best_tag_id = argmax(terminal_var)
path_score = terminal_var[0][best_tag_id]

#反向迭代求最優路徑
best_path = [best_tag_id]
for bptrs_t in reversed(backpointers):
best_tag_id = bptrs_t[best_tag_id]
best_path.append(best_tag_id)

#把start_tag pop出來，最終的結果不需要
start = best_path.pop()
assert start == self.tag_to_ix[START_TAG]
best_path.reverse()
return path_score, best_path
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
其實我最想講的是這個函數

def neg_log_likelihood(self, sentence, tags):
feats = self._get_lstm_features(sentence)
forward_score = self._forward_alg(feats)
gold_score = self._score_sentence(feats, tags)
return forward_score - gold_score
1
2
3
4
5
我們知道forward_score是logZ(x) logZ(x)logZ(x),即log∑y′exp(Score(x,y′)) log{\sum_{y'} \exp{(\text{Score}(x, y')})}log∑
y
′

exp(Score(x,y
′
))，

gold_score是logexp(Score(x,y′) log\exp{(\text{Score}(x, y')}logexp(Score(x,y
′
)

我們的目標是極大化
P(y∣x)=exp(Score(x,y))∑y′exp(Score(x,y′)) P(y|x) = \frac{\exp{(\text{Score}(x, y)})}{\sum_{y'} \exp{(\text{Score}(x, y')})}
P(y∣x)=
∑
y
′

exp(Score(x,y
′
))
exp(Score(x,y))

兩邊取對數即
logP(y∣x)=log exp(Score(x,y))−log∑y′exp(Score(x,y′))logP(y∣x)=gold_score−forward_score logP(y|x) = log\ {\exp{(\text{Score}(x, y)})}-log{\sum_{y'} \exp{(\text{Score}(x, y')})} \\logP(y|x)=gold\_score-forward\_score
logP(y∣x)=log exp(Score(x,y))−log
y
′

∑

exp(Score(x,y
′
))
logP(y∣x)=gold_score−forward_score

所以我們需要極大化gold_score−forward_score gold\_score - forward\_scoregold_score−forward_score,也就是極小化forward_score−gold_score forward\_score -gold\_scoreforward_score−gold_score

也就是為什么forward_score−gold_score forward\_score - gold\_scoreforward_score−gold_score可以作為loss的根本原因
---------------------
作者：zycxnanwang
來源：CSDN
原文：https://blog.csdn.net/zycxnanwang/article/details/90385259
版權聲明：本文為博主原創文章，轉載請附上博文鏈接！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 tensorflow2實現BiLSTM+CRF中文命名實體識別零基礎入門--中文命名實體識別（BiLSTM+CRF模型，含代碼） BiLSTM-CRF 模型實現中文命名實體識別命名實體識別 BiLSTM——CRF 基於BERT的中文命名實體識別任務(BERT-BiLSTM-CRF-NER) NER（BiLSTM+CRF，Keras） NLP入門（八）使用CRF++實現命名實體識別(NER) 基於keras的BiLstm與CRF實現命名實體標注命名實體識別(NER) Pytorch: 命名實體識別: BertForTokenClassification/pytorch-crf