點互信息
Pointwise mutual information (PMI), or point mutual information, is a measure of association used in information theory andstatistics.
The PMI of a pair of outcomes x and y belonging to discrete random variables X and Y quantifies the discrepancy between the probability of their coincidence given their joint distribution and their individual distributions, assuming independence.
來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>
The mutual information (MI) of the random variables X and Y is the expected value of the PMI over all possible outcomes (w.r.t. the joint distribution
).
來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>
http://www.eecis.udel.edu/~trnka/CISC889-11S/lectures/philip-pmi.pdf
Information-theory approach to find
collocations
– Measure of how much one word tells us about the
other. How much information we gain
– Can be negative or positive
Problems with PMI
• Bad with sparse data
– Suppose some words only occur once, but appear
together
– Get very high score PMI score
– Consider our word clouds. High PMI score might
not necessarily indicate importance of bigram
來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>
點互信息由互信息而來
來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>
Finally,
will increase if
is fixed but
decreases.
這就是一個不好的地方 如果聯系緊密 必然一同出現 p(x|y) 那么取決於p(x)的值大小 越不常見的x 值越大 假設 p(y|x)=1 完全相同共現 就就取決於變量的出現頻度了 只出現一次分數最高 偏愛稀有 低頻情況
Bad with word dependence
– Suppose two words are perfectly dependent on
eachother
– Whenever one occurs, the other occurs
– I(x, y) = log (1 / P(y))
– So the rarer the word is, the higher the PMI is
– High PMI score doesn't mean high word
dependence (could just mean rarer words)
– Threshold on word frequencies
來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>
可以看做局部一個點的互信息
考慮互信息
來自 <http://en.wikipedia.org/wiki/Mutual_information>
來自 <http://en.wikipedia.org/wiki/Mutual_information>
It can take positive or negative values, but is zero if X and Y areindependent. PMI maximizes when X and Y are perfectly associated, yielding the following bounds:
來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>
例子
x |
y |
p(x, y) |
0 |
0 |
0.1 |
0 |
1 |
0.7 |
1 |
0 |
0.15 |
1 |
1 |
0.05 |
Using this table we can marginalize to get the following additional table for the individual distributions:
|
p(x) |
p(y) |
0 |
.8 |
0.25 |
1 |
.2 |
0.75 |
With this example, we can compute four values for
. Using base-2 logarithms:
pmi(x=0;y=0) |
−1 |
pmi(x=0;y=1) |
0.222392421 |
pmi(x=1;y=0) |
1.584962501 |
pmi(x=1;y=1) |
−1.584962501 |
(For reference, the mutual information
would then be 0.214170945)
來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>
和互信息的相似處
Where
is the self-information, or
.
來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>
正規化的pmi npmi
Pointwise mutual information can be normalized between [-1,+1] resulting in -1 (in the limit) for never occurring together, 0 for independence, and +1 for complete co-occurrence.
完全共現的時候 可以認為 p(x,y) = p(x)=p(y) 結合
來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>
Chain-rule for pmi
來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>
沒太明白 這個TODO
This is easily proven by:
來自 <http://en.wikipedia.org/wiki/Pointwise_mutual_information>