布朗語料庫中條件概率分布函數ConditionalFreqDist使用

本文轉載自查看原文 2013-10-03 20:23 2532 nltk/ NLTK自然語言處理（python）讀書筆記

布朗語料庫中使用條件概率分布函數ConditionalFreqDist，可以查看每個單詞在各新聞語料中出現的次數。這在微博情感分析中非常有用，比如判斷feature vector中代表positive or negative or neutral的各feature在每條tweet中出現的次數高低來判斷該tweet的情感極性。

from nltk.corpus import brown

cfd=nltk.ConditionalFreqDist(
(genre,word)
for genre in brown.categories()
for word in brown.words(categories=genre)
)
genres=['news','religion','hobbies','science_fiction','romance','humor']
modals=['can','could','may','might','must','will']
print cfd.tabulate(conditions=genres,samples=modals)

輸出結果：

can could may might must will
news 93 86 66 38 50 389
religion 82 59 78 12 54 71
hobbies 268 58 131 22 83 264
science_fiction 16 49 4 12 8 16
romance 74 193 11 51 45 43
humor 16 30 8 8 9 13
可以看出news分類中will一詞出現最多，humor分類中could出現次數最多。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 R概率分布函數使用小結概率密度與概率分布函數概率分布函數(四種）概率分布函數和概率密度函數概率分布函數和概率密度函數概率分布函數 . 概率密度函數關系常見的概率分布概率分布匯總概率筆記5——概率分布常見的概率分布