python 去停用词

本文转载自查看原文 2017-05-25 09:20 3885 python/ stopwords

Try caching the stopwords object, as shown below. Constructing this each time you call the function seems to be the bottleneck.

 from nltk.corpus import stopwords cachedStopWords = stopwords.words("english") def testFuncOld(): text = 'hello bye the the hi' text = ' '.join([word for word in text.split() if word not in stopwords.words("english")]) def testFuncNew(): text = 'hello bye the the hi' text = ' '.join([word for word in text.split() if word not in cachedStopWords]) if __name__ == "__main__": for i in xrange(10000): testFuncOld() testFuncNew()

I ran this through the profiler: python -m cProfile -s cumulative test.py. The relevant lines are posted below.

nCalls Cumulative Time

10000 7.723 words.py:7(testFuncOld)

10000 0.140 words.py:11(testFuncNew)

So, caching the stopwords instance gives a ~70x speedup.

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 python去除停用词（结巴分词下） python利用jieba进行中文分词去停用词 Elasticsearch之停用词 python jieba分词（添加停用词，用户字典取词频 python使用jieba实现中文文档分词和去停用词常用的中文停用词非常不错的停用词词表常用停用词表整理（哈工大停用词表，百度停用词表等）中文分词与停用词的作用 NLTK 停用词、罕见词