點互信息

本文轉載自查看原文 2013-10-03 17:49 4571 機器學習

點互信息

Pointwise mutual information (PMI), or point mutual information, is a measure of association used in information theory andstatistics.

The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence.

來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes (w.r.t. the joint distribution

來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

http://www.eecis.udel.edu/~trnka/CISC889-11S/lectures/philip-pmi.pdf

Information-theory approach to find

collocations

– Measure of how much one word tells us about the

other. How much information we gain

– Can be negative or positive

Problems with PMI

• Bad with sparse data

– Suppose some words only occur once, but appear

together

– Get very high score PMI score

– Consider our word clouds. High PMI score might

not necessarily indicate importance of bigram

來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

點互信息由互信息而來

來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

Finally,

will increase if

is fixed but

decreases.

這就是一個不好的地方如果聯系緊密必然一同出現 p(x|y) 那么取決於p(x)的值大小越不常見的x 值越大假設 p(y|x)=1 完全相同共現就就取決於變量的出現頻度了只出現一次分數最高偏愛稀有低頻情況

Bad with word dependence

– Suppose two words are perfectly dependent on

eachother

– Whenever one occurs, the other occurs

– I(x, y) = log (1 / P(y))

– So the rarer the word is, the higher the PMI is

– High PMI score doesn't mean high word

dependence (could just mean rarer words)

– Threshold on word frequencies

來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

可以看做局部一個點的互信息

考慮互信息

來自 <http://en.wikipedia.org/wiki/Mutual_information>

It can take positive or negative values, but is zero if X and Y areindependent. PMI maximizes when X and Y are perfectly associated, yielding the following bounds:

來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

例子

x	y	p(x, y)
0	0	0.1
0	1	0.7
1	0	0.15
1	1	0.05

Using this table we can marginalize to get the following additional table for the individual distributions:

	p(x)	p(y)
0	.8	0.25
1	.2	0.75

With this example, we can compute four values for

. Using base-2 logarithms:

pmi(x=0;y=0)	−1
pmi(x=0;y=1)	0.222392421
pmi(x=1;y=0)	1.584962501
pmi(x=1;y=1)	−1.584962501

(For reference, the mutual information

would then be 0.214170945)

來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

和互信息的相似處

Where

is the self-information, or

來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

正規化的pmi npmi

Pointwise mutual information can be normalized between [-1,+1] resulting in -1 (in the limit) for never occurring together, 0 for independence, and +1 for complete co-occurrence.

完全共現的時候可以認為 p(x,y) = p(x)=p(y) 結合

來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

Chain-rule for pmi

來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

沒太明白這個TODO

This is easily proven by:

來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 PMI點互信息點互信息PMI PMI點互信息算法互信息和條件互信息互信息互信息互信息（Mutual Information）信息量、熵、互信息機器學習丨什么是互信息熵、相對熵與互信息