機器翻譯評價指標之BLEU詳細計算過程

本文轉載自查看原文 2018-11-24 19:56 1898 機器學習

原文連接 https://blog.csdn.net/guolindonggld/article/details/56966200

1. 簡介

BLEU（Bilingual Evaluation Understudy），相信大家對這個評價指標的概念已經很熟悉，隨便百度谷歌就有相關介紹。原論文為BLEU: a Method for Automatic Evaluation of Machine Translation，IBM出品。

本文通過一個例子詳細介紹BLEU是如何計算以及NLTKnltk.align.bleu_score模塊的源碼。

首先祭出公式：

B L E U = B P \cdot e x p (\sum n = 1 N w n l o g P n)

其中，

B P = {1 e 1 - r / c if c > r if c \leq r

注意這里的BLEU值是針對一條翻譯（一個樣本）來說的。

NLTKnltk.align.bleu_score模塊實現了這里的公式，主要包括三個函數，兩個私有函數分別計算P和BP，一個函數整合計算BLEU值。

# 計算BLEU值 def bleu(candidate, references, weights) # （1）私有函數，計算修正的n元精確率（Modified n-gram Precision） def _modified_precision(candidate, references, n) # （2）私有函數，計算BP懲罰因子 def _brevity_penalty(candidate, references)

例子：

候選譯文（Predicted）：
It is a guide to action which ensures that the military always obeys the commands of the party

參考譯文（Gold Standard）
1：It is a guide to action that ensures that the military will forever heed Party commands
2：It is the guiding principle which guarantees the military forces always being under the command of the Party
3：It is the practical guide for the army always to heed the directions of the party

2. Modified n-gram Precision計算（也即是 $P_{n}$

def _modified_precision(candidate, references, n): counts = Counter(ngrams(candidate, n)) if not counts: return 0 max_counts = {} for reference in references: reference_counts = Counter(ngrams(reference, n)) for ngram in counts: max_counts[ngram] = max(max_counts.get(ngram, 0), reference_counts[ngram]) clipped_counts = dict((ngram, min(count, max_counts[ngram])) for ngram, count in counts.items()) return sum(clipped_counts.values()) / sum(counts.values())

我們這里 $n$

Modified 1-gram precision：

首先統計候選譯文里每個詞出現的次數，然后統計每個詞在參考譯文中出現的次數，Max表示3個參考譯文中的最大值，Min表示候選譯文和Max兩個的最小值。

詞	候選譯文	參考譯文1	參考譯文2	參考譯文3	Max	Min
the	3	1	4	4	4	3
obeys	1	0	0	0	0	0
a	1	1	0	0	1	1
which	1	0	1	0	1	1
ensures	1	1	0	0	1	1
guide	1	1	0	1	1	1
always	1	0	1	1	1	1
is	1	1	1	1	1	1
of	1	0	1	1	1	1
to	1	1	0	1	1	1
commands	1	1	0	0	1	1
that	1	2	0	0	2	1
It	1	1	1	1	1	1
action	1	1	0	0	1	1
party	1	0	0	1	1	1
military	1	1	1	0	1	1

然后將每個詞的Min值相加，將候選譯文每個詞出現的次數相加，然后兩值相除即得 $P_{1} = \frac{3 + 0 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1}{3 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1} = 0.95$

類似可得：

Modified 2-gram precision：

詞	候選譯文	參考譯文1	參考譯文2	參考譯文3	Max	Min
ensures that	1	1	0	0	1	1
guide to	1	1	0	0	1	1
which ensures	1	0	0	0	0	0
obeys the	1	0	0	0	0	0
commands of	1	0	0	0	0	0
that the	1	1	0	0	1	1
a guide	1	1	0	0	1	1
of the	1	0	1	1	1	1
always obeys	1	0	0	0	0	0
the commands	1	0	0	0	0	0
to action	1	1	0	0	1	1
the party	1	0	0	1	1	1
is a	1	1	0	0	1	1
action which	1	0	0	0	0	0
It is	1	1	1	1	1	1
military always	1	0	0	0	0	0
the military	1	1	1	0	1	1

$P_{2} = \frac{10}{17} = 0.588235294$

Modified 3-gram precision：

詞	候選譯文	參考譯文1	參考譯文3	Max	Min
ensures that the	1	1	0	1	1
which ensures that	1	0	0	0	0
action which ensures	1	0	0	0	0
a guide to	1	1	0	1	1
military always obeys	1	0	0	0	0
the commands of	1	0	0	0	0
commands of the	1	0	0	0	0
to action which	1	0	0	0	0
the military always	1	0	0	0	0
obeys the commands	1	0	0	0	0
It is a	1	1	0	1	1
of the party	1	0	1	1	1
is a guide	1	1	0	1	1
that the military	1	1	0	1	1
always obeys the	1	0	0	0	0
guide to action	1	1	0	1	1

$P_{3} = \frac{7}{16} = 0.4375$

Modified 4-gram precision：

詞	候選譯文	參考譯文1	Max	Min
to action which ensures	1	0	0	0
action which ensures that	1	0	0	0
guide to action which	1	0	0	0
obeys the commands of	1	0	0	0
which ensures that the	1	0	0	0
commands of the party	1	0	0	0
ensures that the military	1	1	1	1
a guide to action	1	1	1	1
always obeys the commands	1	0	0	0
that the military always	1	0	0	0
the commands of the	1	0	0	0
the military always obeys	1	0	0	0
military always obeys the	1	0	0	0
is a guide to	1	1	1	1
It is a guide	1	1	1	1

$P_{4} = \frac{4}{15} = 0.266666667$

然后我們取 $w_{1} = w_{2} = w_{3} = w_{4} = 0.25$

所以：

$\sum_{i = 1}^{N} w_{n} \log P_{n} = 0.25 * \log P_{1} + 0.25 * \log P_{2} + 0.25 * \log P_{3} + 0.25 * \log P_{4} = - 0.684055269517$

3. Brevity Penalty 計算

def _brevity_penalty(candidate, references): c = len(candidate) ref_lens = (len(reference) for reference in references) #這里有個知識點是Python中元組是可以比較的，如(0,1)>(1,0)返回False，這里利用元組比較實現了選取參考翻譯中長度最接近候選翻譯的句子，當最接近的參考翻譯有多個時，選取最短的。例如候選翻譯長度是10，兩個參考翻譯長度分別為9和11，則r=9. r = min(ref_lens, key=lambda ref_len: (abs(ref_len - c), ref_len)) print 'r:',r if c > r: return 1 else: return math.exp(1 - r / c)

下面計算BP（Brevity Penalty），翻譯過來就是“過短懲罰”。由BP的公式可知取值范圍是(0,1]，候選句子越短，越接近0。

候選翻譯句子長度為18，參考翻譯分別為：16，18，16。
所以 $c = 18$

所以 $B P = e^{0} = 1$

4. 整合

最終 $B L E U = 1 \cdot e x p (- 0.684055269517) = 0.504566684006$

BLEU的取值范圍是[0,1]，0最差，1最好。

通過計算過程，我們可以看到，BLEU值其實也就是“改進版的n-gram”加上“過短懲罰因子”。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 機器翻譯評價指標 — BLEU算法機器翻譯評價指標機器翻譯評測——BLEU算法詳解 (新增在線計算BLEU分值) Deep Learning基礎--機器翻譯BLEU與Perplexity詳解機器翻譯評測——BLEU改進后的NIST算法利用BLEU進行機器翻譯檢測（Python-NLTK-BLEU評分方法）【nlp】BLEU、ROUGE評價指標 NMT 機器翻譯【機器翻譯】機器翻譯入門神經機器翻譯-NMT