提取關鍵字的文章是,小說完美世界的前十章;
我事先把前十章合並到了一個文件中;
然后直接調用關鍵字函數;
1 import sys 2 sys.path.append('../') 3 4 import jieba 5 import jieba.analyse 6 from optparse import OptionParser#引入關鍵詞的包 7 from docopt import docopt 8 data_path = "C:\\Users\\wangyuguang\\Desktop\\work_data\\profect_world\\" 9 topK = 10 10 withWeight = False 11 content = "" 12 for i in range(1,2): 13 Data_path = data_path + "he"+".txt" 14 content ="".join(open(Data_path, 'rb').read()) 15 # print content 16 tags = jieba.analyse.extract_tags(content, topK=topK, withWeight=withWeight)#直接調用 17 18 if withWeight is True: 19 for tag in tags: 20 print("tag: %s\t\t weight: %f" % (tag[0],tag[1])) 21 else: 22 print(",".join(tags))
關鍵字結果:
Building prefix dict from the default dictionary ... Loading model from cache c:\users\wangyuguang\appdata\local\temp\jieba.cache Loading model cost 0.386 seconds. Prefix dict has been built succesfully. 小不點,孩子,族長,石雲峰,石村,凶禽,青鱗鷹,凶獸,一群,石昊