python 去停用詞

本文轉載自查看原文 2017-05-25 09:20 3885 python/ stopwords

Try caching the stopwords object, as shown below. Constructing this each time you call the function seems to be the bottleneck.

 from nltk.corpus import stopwords cachedStopWords = stopwords.words("english") def testFuncOld(): text = 'hello bye the the hi' text = ' '.join([word for word in text.split() if word not in stopwords.words("english")]) def testFuncNew(): text = 'hello bye the the hi' text = ' '.join([word for word in text.split() if word not in cachedStopWords]) if __name__ == "__main__": for i in xrange(10000): testFuncOld() testFuncNew()

I ran this through the profiler: python -m cProfile -s cumulative test.py. The relevant lines are posted below.

nCalls Cumulative Time

10000 7.723 words.py:7(testFuncOld)

10000 0.140 words.py:11(testFuncNew)

So, caching the stopwords instance gives a ~70x speedup.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python去除停用詞（結巴分詞下） python利用jieba進行中文分詞去停用詞 Elasticsearch之停用詞 python jieba分詞（添加停用詞，用戶字典取詞頻 python使用jieba實現中文文檔分詞和去停用詞常用的中文停用詞非常不錯的停用詞詞表常用停用詞表整理（哈工大停用詞表，百度停用詞表等）中文分詞與停用詞的作用 NLTK 停用詞、罕見詞