自然語言19.1_Lemmatizing with NLTK（單詞變體還原）

本文轉載自查看原文 2016-11-19 11:26 5036 單詞變體還原/ python/ Lemmatizing/ nltk/ 自然語言/ pickle

python金融風控評分卡模型和數據分析微專業課（博主親自錄制視頻）：http://dwz.date/b9vv

Lemmatizing with NLTK

# -*- coding: utf-8 -*-
"""
Spyder Editor

author 231469242@qq.com
微信公眾號：pythonEducation
"""

import nltk
from nltk.stem import WordNetLemmatizer

lemmatizer=WordNetLemmatizer()
#如果不提供第二個參數，單詞變體還原為名詞
#pythonly 無法還原，說明精確度仍然達不到100%
print(lemmatizer.lemmatize("cats"))
print(lemmatizer.lemmatize("cacti"))
print(lemmatizer.lemmatize("geese"))
print(lemmatizer.lemmatize("rocks"))
print(lemmatizer.lemmatize("pythonly"))
print(lemmatizer.lemmatize("better", pos="a"))
print(lemmatizer.lemmatize("best", pos="a"))
print(lemmatizer.lemmatize("run"))
print(lemmatizer.lemmatize("run",'v'))    
    
'''
cat
cactus
goose
rock
pythonly
good
best
run
run

'''

A very similar operation to stemming is called lemmatizing. The major difference between these is, as you saw earlier, stemming can often create non-existent words, whereas lemmas are actual words.

So, your root stem, meaning the word you end up with, is not something you can just look up in a dictionary, but you can look up a lemma.

Some times you will wind up with a very similar word, but sometimes, you will wind up with a completely different word. Let's see some examples.

from nltk.stem import WordNetLemmatizer lemmatizer = WordNetLemmatizer() print(lemmatizer.lemmatize("cats")) print(lemmatizer.lemmatize("cacti")) print(lemmatizer.lemmatize("geese")) print(lemmatizer.lemmatize("rocks")) print(lemmatizer.lemmatize("python")) print(lemmatizer.lemmatize("better", pos="a")) print(lemmatizer.lemmatize("best", pos="a")) print(lemmatizer.lemmatize("run")) print(lemmatizer.lemmatize("run",'v'))

Here, we've got a bunch of examples of the lemma for the words that we use. The only major thing to note is that lemmatize takes a part of speech parameter, "pos." If not supplied, the default is "noun." This means that an attempt will be made to find the closest noun, which can create trouble for you. Keep this in mind if you use lemmatizing!

In the next tutorial, we're going to dive into the NTLK corpus that came with the module, looking at all of the awesome documents they have waiting for us there.

python機器學習生物信息學系列課（博主錄制）：http://dwz.date/b9vw

歡迎關注博主主頁，學習python視頻資源

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 自然語言16_Chunking with NLTK NLTK自然語言處理庫自然語言處理(1)之NLTK與PYTHON NLTK與自然語言處理基礎自然語言處理NLTK之入門自然語言22_Wordnet with NLTK 自然語言14_Stemming words with NLTK nltk RegexpTokenizer類:python自然語言處理轉 --自然語言工具包（NLTK）小結自然語言18.2_NLTK命名實體識別