TextGrocery中文文本分類處理

本文轉載自查看原文 2016-11-08 22:46 1669 python

詳細使用說明：http://textgrocery.readthedocs.io/zh/latest/index.html

TextGrocery是一個基於LibLinear和結巴分詞的短文本分類工具，特點是高效易用，同時支持中文和英文語料。

需要安裝：

pip install classifier

過程：

>>> from tgrocery import Grocery
# 新開張一個雜貨鋪（別忘了取名）
>>> grocery = Grocery('sample')
# 訓練文本可以用列表傳入
>>> train_src = [
        ('education', '名師指導托福語法技巧：名詞的復數形式'),
...     ('education', '中國高考成績海外認可 是“狼來了”嗎？'),
...     ('sports', '圖文：法網孟菲爾斯苦戰進16強 孟菲爾斯怒吼'),
...     ('sports', '四川丹棱舉行全國長距登山挑戰賽 近萬人參與')
... ]
>>> grocery.train(train_src)
Building prefix dict from the default dictionary ...
Dumping model to file cache /tmp/jieba.cache
Loading model cost 1.125 seconds.
Prefix dict has been built succesfully.
*
optimization finished, #iter = 3
Objective value = -1.092381
nSV = 8
<tgrocery.Grocery object at 0x7f23cf243b50>
>>> grocery.save()
>>> new_grocery = Grocery('sample')
>>> new_grocery.load()
>>> new_grocery.predict('考生必讀：新托福寫作考試評分標准')
<tgrocery.base.GroceryPredictResult object at 0x4490d50>
>>> new_grocery.predict('考生必讀：新托福寫作考試評分標准')
<tgrocery.base.GroceryPredictResult object at 0x4490d90>
>>> result = new_grocery.predict('考生必讀：新托福寫作考試評分標准')
>>> print result
education

完畢。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 中文文本分類中文文本分類中文文本分類之CharCNN 中文文本分類之TextRNN Pytorch-中文文本分類 Pytorch之Bert中文文本分類（二） xlnet中文文本分類任務中文文本分類大概的步驟基於bert的中文文本分類 2.中文文本分類實戰