NLTK vs SKLearn vs Gensim vs TextBlob vs spaCy

本文转载自查看原文 2017-05-24 15:13 2178 NLTK/ gensim/ spaCy/ SKLearn/ TextBlob

Generally,

NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.)
Sklearn is used primarily for machine learning (classification, clustering, etc.)
Gensim is used primarily for topic modeling and document similarity.

Having said that, NLTK provides a nice wrapper for Sklearn's classifiers -
nltk.classify package
Combining Scikit-Learn and NTLK
Python NLP - NLTK and scikit-learn

And, to confuse you further, there also exist TextBlob: Simplified Text Processing

and spaCy.io | Build Tomorrow's Language Technologies -
aiming to give industry-ready NLP modules instead of NLTK,
including a single quick algorithm for each of tokenization, POS tagging and parsing and word vectors for similarity calculation.

I suggest that you mix and match, according to your needs.

通常，
NLTK主要用于一般NLP任务（标记化，POS标记，解析等）
Sklearn主要用于机器学习（分类，聚类等）
Gensim主要用于主题建模和文档相似性。
话虽如此，NLTK为Sklearn的分类器提供了一个很好的包装器 -
nltk.classify包
 结合Scikit-Learn和NTLK
Python NLP - NLTK和scikit学习

而且，更为混淆的是，还有TextBlob：简化文本处理

和spaCy.io | 构建明天的语言技术 -
旨在提供行业准备的NLP模块而不是NLTK，
包括用于每个标记化，POS标记和解析的单个快速算法和用于相似性计算的字矢量。

我建议你根据你的需要混合搭配。

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 英文分词对比nltk vs spacy Playwright VS Selenium VS Puppeteer VS Cypress Consul vs Zookeeper vs Etcd vs Eureka VS XCOPY vs 字体 ArrayList vs LinkedList vs Vector easypoi vs easyexcel vs poi VS 插件【VS】VS各种文件作用详解 SASS VS LESS VS Stylus