原文連接 https://blog.csdn.net/guolindonggld/article/details/56966200
1. 簡介
BLEU(Bilingual Evaluation Understudy),相信大家對這個評價指標的概念已經很熟悉,隨便百度谷歌就有相關介紹。原論文為BLEU: a Method for Automatic Evaluation of Machine Translation,IBM出品。
本文通過一個例子詳細介紹BLEU是如何計算以及NLTKnltk.align.bleu_score模塊的源碼。
首先祭出公式:
其中,
注意這里的BLEU值是針對一條翻譯(一個樣本)來說的。
NLTKnltk.align.bleu_score模塊實現了這里的公式,主要包括三個函數,兩個私有函數分別計算P和BP,一個函數整合計算BLEU值。
# 計算BLEU值 def bleu(candidate, references, weights) # (1)私有函數,計算修正的n元精確率(Modified n-gram Precision) def _modified_precision(candidate, references, n) # (2)私有函數,計算BP懲罰因子 def _brevity_penalty(candidate, references)
例子:
候選譯文(Predicted):
It is a guide to action which ensures that the military always obeys the commands of the party
參考譯文(Gold Standard)
1:It is a guide to action that ensures that the military will forever heed Party commands
2:It is the guiding principle which guarantees the military forces always being under the command of the Party
3:It is the practical guide for the army always to heed the directions of the party
2. Modified n-gram Precision計算(也即是PnPn)
def _modified_precision(candidate, references, n): counts = Counter(ngrams(candidate, n)) if not counts: return 0 max_counts = {} for reference in references: reference_counts = Counter(ngrams(reference, n)) for ngram in counts: max_counts[ngram] = max(max_counts.get(ngram, 0), reference_counts[ngram]) clipped_counts = dict((ngram, min(count, max_counts[ngram])) for ngram, count in counts.items()) return sum(clipped_counts.values()) / sum(counts.values())
我們這里nn取值為4,也就是從1-gram計算到4-gram。
Modified 1-gram precision:
首先統計候選譯文里每個詞出現的次數,然后統計每個詞在參考譯文中出現的次數,Max表示3個參考譯文中的最大值,Min表示候選譯文和Max兩個的最小值。
詞 | 候選譯文 | 參考譯文1 | 參考譯文2 | 參考譯文3 | Max | Min |
---|---|---|---|---|---|---|
the | 3 | 1 | 4 | 4 | 4 | 3 |
obeys | 1 | 0 | 0 | 0 | 0 | 0 |
a | 1 | 1 | 0 | 0 | 1 | 1 |
which | 1 | 0 | 1 | 0 | 1 | 1 |
ensures | 1 | 1 | 0 | 0 | 1 | 1 |
guide | 1 | 1 | 0 | 1 | 1 | 1 |
always | 1 | 0 | 1 | 1 | 1 | 1 |
is | 1 | 1 | 1 | 1 | 1 | 1 |
of | 1 | 0 | 1 | 1 | 1 | 1 |
to | 1 | 1 | 0 | 1 | 1 | 1 |
commands | 1 | 1 | 0 | 0 | 1 | 1 |
that | 1 | 2 | 0 | 0 | 2 | 1 |
It | 1 | 1 | 1 | 1 | 1 | 1 |
action | 1 | 1 | 0 | 0 | 1 | 1 |
party | 1 | 0 | 0 | 1 | 1 | 1 |
military | 1 | 1 | 1 | 0 | 1 | 1 |
然后將每個詞的Min值相加,將候選譯文每個詞出現的次數相加,然后兩值相除即得P1=3+0+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+13+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1=0.95P1=3+0+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+13+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1=0.95。
類似可得:
Modified 2-gram precision:
詞 | 候選譯文 | 參考譯文1 | 參考譯文2 | 參考譯文3 | Max | Min |
---|---|---|---|---|---|---|
ensures that | 1 | 1 | 0 | 0 | 1 | 1 |
guide to | 1 | 1 | 0 | 0 | 1 | 1 |
which ensures | 1 | 0 | 0 | 0 | 0 | 0 |
obeys the | 1 | 0 | 0 | 0 | 0 | 0 |
commands of | 1 | 0 | 0 | 0 | 0 | 0 |
that the | 1 | 1 | 0 | 0 | 1 | 1 |
a guide | 1 | 1 | 0 | 0 | 1 | 1 |
of the | 1 | 0 | 1 | 1 | 1 | 1 |
always obeys | 1 | 0 | 0 | 0 | 0 | 0 |
the commands | 1 | 0 | 0 | 0 | 0 | 0 |
to action | 1 | 1 | 0 | 0 | 1 | 1 |
the party | 1 | 0 | 0 | 1 | 1 | 1 |
is a | 1 | 1 | 0 | 0 | 1 | 1 |
action which | 1 | 0 | 0 | 0 | 0 | 0 |
It is | 1 | 1 | 1 | 1 | 1 | 1 |
military always | 1 | 0 | 0 | 0 | 0 | 0 |
the military | 1 | 1 | 1 | 0 | 1 | 1 |
P2=1017=0.588235294P2=1017=0.588235294
Modified 3-gram precision:
詞 | 候選譯文 | 參考譯文1 | 參考譯文2 | 參考譯文3 | Max | Min |
---|---|---|---|---|---|---|
ensures that the | 1 | 1 | 0 | 0 | 1 | 1 |
which ensures that | 1 | 0 | 0 | 0 | 0 | 0 |
action which ensures | 1 | 0 | 0 | 0 | 0 | 0 |
a guide to | 1 | 1 | 0 | 0 | 1 | 1 |
military always obeys | 1 | 0 | 0 | 0 | 0 | 0 |
the commands of | 1 | 0 | 0 | 0 | 0 | 0 |
commands of the | 1 | 0 | 0 | 0 | 0 | 0 |
to action which | 1 | 0 | 0 | 0 | 0 | 0 |
the military always | 1 | 0 | 0 | 0 | 0 | 0 |
obeys the commands | 1 | 0 | 0 | 0 | 0 | 0 |
It is a | 1 | 1 | 0 | 0 | 1 | 1 |
of the party | 1 | 0 | 0 | 1 | 1 | 1 |
is a guide | 1 | 1 | 0 | 0 | 1 | 1 |
that the military | 1 | 1 | 0 | 0 | 1 | 1 |
always obeys the | 1 | 0 | 0 | 0 | 0 | 0 |
guide to action | 1 | 1 | 0 | 0 | 1 | 1 |
P3=716=0.4375P3=716=0.4375
Modified 4-gram precision:
詞 | 候選譯文 | 參考譯文1 | 參考譯文2 | 參考譯文3 | Max | Min |
---|---|---|---|---|---|---|
to action which ensures | 1 | 0 | 0 | 0 | 0 | 0 |
action which ensures that | 1 | 0 | 0 | 0 | 0 | 0 |
guide to action which | 1 | 0 | 0 | 0 | 0 | 0 |
obeys the commands of | 1 | 0 | 0 | 0 | 0 | 0 |
which ensures that the | 1 | 0 | 0 | 0 | 0 | 0 |
commands of the party | 1 | 0 | 0 | 0 | 0 | 0 |
ensures that the military | 1 | 1 | 0 | 0 | 1 | 1 |
a guide to action | 1 | 1 | 0 | 0 | 1 | 1 |
always obeys the commands | 1 | 0 | 0 | 0 | 0 | 0 |
that the military always | 1 | 0 | 0 | 0 | 0 | 0 |
the commands of the | 1 | 0 | 0 | 0 | 0 | 0 |
the military always obeys | 1 | 0 | 0 | 0 | 0 | 0 |
military always obeys the | 1 | 0 | 0 | 0 | 0 | 0 |
is a guide to | 1 | 1 | 0 | 0 | 1 | 1 |
It is a guide | 1 | 1 | 0 | 0 | 1 | 1 |
P4=415=0.266666667P4=415=0.266666667
然后我們取w1=w2=w3=w4=0.25w1=w2=w3=w4=0.25,也就是Uniform Weights。
所以:
∑Ni=1wnlogPn=0.25∗logP1+0.25∗logP2+0.25∗logP3+0.25∗logP4=−0.684055269517∑i=1NwnlogPn=0.25∗logP1+0.25∗logP2+0.25∗logP3+0.25∗logP4=−0.684055269517
3. Brevity Penalty 計算
def _brevity_penalty(candidate, references): c = len(candidate) ref_lens = (len(reference) for reference in references) #這里有個知識點是Python中元組是可以比較的,如(0,1)>(1,0)返回False,這里利用元組比較實現了選取參考翻譯中長度最接近候選翻譯的句子,當最接近的參考翻譯有多個時,選取最短的。例如候選翻譯長度是10,兩個參考翻譯長度分別為9和11,則r=9. r = min(ref_lens, key=lambda ref_len: (abs(ref_len - c), ref_len)) print 'r:',r if c > r: return 1 else: return math.exp(1 - r / c)
下面計算BP(Brevity Penalty),翻譯過來就是“過短懲罰”。由BP的公式可知取值范圍是(0,1],候選句子越短,越接近0。
候選翻譯句子長度為18,參考翻譯分別為:16,18,16。
所以c=18c=18,r=18r=18(參考翻譯中選取長度最接近候選翻譯的作為rr)
所以BP=e0=1BP=e0=1
4. 整合
最終BLEU=1⋅exp(−0.684055269517)=0.504566684006BLEU=1⋅exp(−0.684055269517)=0.504566684006。
BLEU的取值范圍是[0,1],0最差,1最好。
通過計算過程,我們可以看到,BLEU值其實也就是“改進版的n-gram”加上“過短懲罰因子”。