Python 中文词频统计，热词统计，简要分析（含上手源码）

本文转载自查看原文 2020-02-02 23:16 1305 python/ hanlp

jieba库有三种模式

精确模式、全模式、搜索引擎模式

- 精确模式：把文本精确的切分开，不存在冗余单词
- 全模式：把文本中所有可能的词语都扫描出来，有冗余

- 搜索引擎模式：在精确模式基础上，对长词再次切分

应用实例：

代码：

 1 import jieba
 2 
 3 file = open('E:/578095023/FileRecv/寒假作业/test.txt', encoding="utf-8")
 4 txt = file.read()
 5 #words = jieba.lcut(txt)  #无空格
 6 #words = jieba.lcut(txt,cut_all=True)   #有空格
 7 words = jieba.lcut_for_search(txt)
 8 counts = {}
 9 for word in words:
10     if len(word) == 1:
11         continue
12     else:
13         counts[word] = counts.get(word, 0) + 1
14 
15 items = list(counts.items())
16 
17 items.sort(key=lambda x: x[1], reverse=True)
18 # items.sort(reverse = True)
19 for i in range(20):
20     word, count = items[i]
21     print(word, count)
22 #    print('{0:<10}{1:>5}'.format(word,count))

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 中文词频统计与词云生成中文词频统计与词云生成中文词频统计 Python 中文文件统计词频 + 中文词云初学Hadoop之中文词频统计英文词频统计的java实现方法【Python】词频统计词频统计（python） python绘制中文词云图 Python基于jieba的中文词云