【python】利用jieba中文分詞進行詞頻統計及生成詞雲

本文轉載自查看原文 2020-04-15 10:54 1893 python

以下代碼對魯迅的《祝福》進行了詞頻統計:

 1 import io
 2 import jieba
 3 txt = io.open("zhufu.txt", "r", encoding='utf-8').read()
 4 words  = jieba.lcut(txt)
 5 counts = {}
 6 for word in words:
 7     if len(word) == 1:
 8         continue
 9     else:
10         counts[word] = counts.get(word,0) + 1
11 items = list(counts.items())
12 items.sort(key=lambda x:x[1], reverse=True) 
13 for i in range(15):
14     word, count = items[i]
15     print (u"{0:<10}{1:>5}".format(word, count))

結果如下：

並把它生成詞雲

 1 from wordcloud import WordCloud
 2 import PIL.Image as image
 3 import numpy as np
 4 import jieba
 5  
 6 # 分詞
 7 def trans_CN(text):
 8     # 接收分詞的字符串
 9     word_list = jieba.cut(text)
10     # 分詞后在單獨個體之間加上空格
11     result = " ".join(word_list)
12     return result
13  
14 with open("zhufu.txt") as fp:
15     text = fp.read()
16     # print(text)
17     # 將讀取的中文文檔進行分詞
18     text = trans_CN(text)
19     mask = np.array(image.open("xinxing.jpg"))
20     wordcloud = WordCloud(
21         # 添加遮罩層
22         mask=mask,
23         font_path = "msyh.ttc"
24     ).generate(text)
25     image_produce = wordcloud.to_image()
26     image_produce.show()

效果如下：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 利用jieba分詞進行詞頻統計中文詞頻統計與詞雲生成中文詞頻統計與詞雲生成中文詞頻統計與詞雲生成用jieba庫統計文本詞頻及雲詞圖的生成中文詞頻統計與詞雲生成中文詞頻統計與詞雲生成中文詞頻統計與詞雲生成 Python大數據：jieba 中文分詞，詞頻統計詞雲圖 Python利用jieba庫做詞頻統計