Python 中文詞頻統計，熱詞統計，簡要分析（含上手源碼）

本文轉載自查看原文 2020-02-02 23:16 1305 python/ hanlp

jieba庫有三種模式

精確模式、全模式、搜索引擎模式

- 精確模式：把文本精確的切分開，不存在冗余單詞
- 全模式：把文本中所有可能的詞語都掃描出來，有冗余

- 搜索引擎模式：在精確模式基礎上，對長詞再次切分

應用實例：

代碼：

 1 import jieba
 2 
 3 file = open('E:/578095023/FileRecv/寒假作業/test.txt', encoding="utf-8")
 4 txt = file.read()
 5 #words = jieba.lcut(txt)  #無空格
 6 #words = jieba.lcut(txt,cut_all=True)   #有空格
 7 words = jieba.lcut_for_search(txt)
 8 counts = {}
 9 for word in words:
10     if len(word) == 1:
11         continue
12     else:
13         counts[word] = counts.get(word, 0) + 1
14 
15 items = list(counts.items())
16 
17 items.sort(key=lambda x: x[1], reverse=True)
18 # items.sort(reverse = True)
19 for i in range(20):
20     word, count = items[i]
21     print(word, count)
22 #    print('{0:<10}{1:>5}'.format(word,count))

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 中文詞頻統計與詞雲生成中文詞頻統計與詞雲生成 Python 中文詞頻統計 Python中文詞頻統計中文詞頻統計及詞雲制作中文詞頻統計與詞雲生成中文詞頻統計與詞雲生成中文詞頻統計與詞雲生成中文詞頻統計與詞雲生成中文詞頻統計