Python 結巴分詞(2)關鍵字提取


 

提取關鍵字的文章是,小說完美世界的前十章;

我事先把前十章合並到了一個文件中;

然后直接調用關鍵字函數;

 1 import sys
 2 sys.path.append('../')
 3 
 4 import jieba
 5 import jieba.analyse
 6 from optparse import OptionParser#引入關鍵詞的包
 7 from docopt import docopt
 8 data_path = "C:\\Users\\wangyuguang\\Desktop\\work_data\\profect_world\\"
 9 topK = 10
10 withWeight = False
11 content = ""
12 for i in range(1,2):
13     Data_path = data_path + "he"+".txt"
14     content ="".join(open(Data_path, 'rb').read())
15 # print content
16 tags = jieba.analyse.extract_tags(content, topK=topK, withWeight=withWeight)#直接調用
17 
18 if withWeight is True:
19     for tag in tags:
20         print("tag: %s\t\t weight: %f" % (tag[0],tag[1]))
21 else:
22     print(",".join(tags))

關鍵字結果:

Building prefix dict from the default dictionary ...
Loading model from cache c:\users\wangyuguang\appdata\local\temp\jieba.cache
Loading model cost 0.386 seconds.
Prefix dict has been built succesfully.
小不點,孩子,族長,石雲峰,石村,凶禽,青鱗鷹,凶獸,一群,石昊


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM