NLTK vs SKLearn vs Gensim vs TextBlob vs spaCy


Generally, 
  • NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.)
  • Sklearn is used primarily for machine learning (classification, clustering, etc.)
  • Gensim is used primarily for topic modeling and document similarity.
Having said that, NLTK provides a nice wrapper for Sklearn's classifiers - 
nltk.classify package
Combining Scikit-Learn and NTLK
Python NLP - NLTK and scikit-learn

And, to confuse you further, there also exist TextBlob: Simplified Text Processing

and spaCy.io | Build Tomorrow's Language Technologies - 
aiming to give industry-ready NLP modules instead of NLTK,
including a single quick algorithm for each of tokenization, POS tagging and parsing and word vectors for similarity calculation.

I suggest that you mix and match, according to your needs.

通常,
NLTK主要用於一般NLP任務(標記化,POS標記,解析等)
Sklearn主要用於機器學習(分類,聚類等)
Gensim主要用於主題建模和文檔相似性。
話雖如此,NLTK為Sklearn的分類器提供了一個很好的包裝器 -
nltk.classify包
結合Scikit-Learn和NTLK
Python NLP - NLTK和scikit學習

而且,更為混淆的是,還有TextBlob:簡化文本處理

spaCy.io | 構建明天的語言技術 -
旨在提供行業准備的NLP模塊而不是NLTK,
包括用於每個標記化,POS標記和解析的單個快速算法和用於相似性計算的字矢量。

我建議你根據你的需要混合搭配。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM