詳細使用說明:http://textgrocery.readthedocs.io/zh/latest/index.html
TextGrocery是一個基於LibLinear和結巴分詞的短文本分類工具,特點是高效易用,同時支持中文和英文語料。
需要安裝:
pip install classifier
過程:
>>> from tgrocery import Grocery # 新開張一個雜貨鋪(別忘了取名) >>> grocery = Grocery('sample') # 訓練文本可以用列表傳入 >>> train_src = [ ('education', '名師指導托福語法技巧:名詞的復數形式'), ... ('education', '中國高考成績海外認可 是“狼來了”嗎?'), ... ('sports', '圖文:法網孟菲爾斯苦戰進16強 孟菲爾斯怒吼'), ... ('sports', '四川丹棱舉行全國長距登山挑戰賽 近萬人參與') ... ] >>> grocery.train(train_src) Building prefix dict from the default dictionary ... Dumping model to file cache /tmp/jieba.cache Loading model cost 1.125 seconds. Prefix dict has been built succesfully. * optimization finished, #iter = 3 Objective value = -1.092381 nSV = 8 <tgrocery.Grocery object at 0x7f23cf243b50> >>> grocery.save() >>> new_grocery = Grocery('sample') >>> new_grocery.load() >>> new_grocery.predict('考生必讀:新托福寫作考試評分標准') <tgrocery.base.GroceryPredictResult object at 0x4490d50> >>> new_grocery.predict('考生必讀:新托福寫作考試評分標准') <tgrocery.base.GroceryPredictResult object at 0x4490d90> >>> result = new_grocery.predict('考生必讀:新托福寫作考試評分標准') >>> print result education
完畢。