自然語言22_Wordnet with NLTK


 python機器學習-乳腺癌細胞挖掘(博主親自錄制視頻)https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

 機器學習,統計項目合作QQ:231469242

 

Wordnet with NLTK

英語的同義詞和反義詞函數
# -*- coding: utf-8 -*-
"""
Spyder Editor

英語的同義詞和反義詞函數
"""

import nltk
from nltk.corpus import wordnet
syns=wordnet.synsets('program')
'''
syns
Out[11]: 
[Synset('plan.n.01'),
 Synset('program.n.02'),
 Synset('broadcast.n.02'),
 Synset('platform.n.02'),
 Synset('program.n.05'),
 Synset('course_of_study.n.01'),
 Synset('program.n.07'),
 Synset('program.n.08'),
 Synset('program.v.01'),
 Synset('program.v.02')]

'''

print(syns[0].name())

'''
plan.n.01
'''    
    
#just the word只顯示文字,lemma要點
print(syns[0].lemmas()[0].name())    
'''
plan
'''    
#單詞句子使用
print(syns[0].examples())
'''
['they drew up a six-step plan', 'they discussed plans for a new bond issue']
'''    

'''
synonyms=[]
antonyms=[]

list_good=wordnet.synsets("good")
for syn in list_good:
    for l in syn.lemmas():
        #print('l.name()',l.name())
        synonyms.append(l.name())
        if l.antonyms():
            antonyms.append(l.antonyms()[0].name())

print(set(synonyms))
print(set(antonyms))
'''

word="good"
#返回一個單詞的同義詞和反義詞列表
def Word_synonyms_and_antonyms(word):
    synonyms=[]
    antonyms=[]
    list_good=wordnet.synsets(word)
    for syn in list_good:
        for l in syn.lemmas():
            #print('l.name()',l.name())
            synonyms.append(l.name())
            if l.antonyms():
                antonyms.append(l.antonyms()[0].name())
    return (set(synonyms),set(antonyms))

#返回一個單詞的同義詞列表
def Word_synonyms(word):
    list_synonyms_and_antonyms=Word_synonyms_and_antonyms(word)
    return list_synonyms_and_antonyms[0]
    
    
#返回一個單詞的反義詞列表
def Word_antonyms(word):
    list_synonyms_and_antonyms=Word_synonyms_and_antonyms(word)
    return list_synonyms_and_antonyms[1]    


'''
Word_synonyms("evil")
Out[43]: 
{'evil',
 'evilness',
 'immorality',
 'iniquity',
 'malefic',
 'malevolent',
 'malign',
 'vicious',
 'wickedness'}

Word_antonyms('evil')
Out[44]: {'good', 'goodness'}
'''

 

 

 



wordNet是一個英語詞匯數據庫,普林斯頓大學創建,是nltk語料庫的一部分

 

WordNet is a lexical database for the English language, which was created by Princeton, and is part of the NLTK corpus.

You can use WordNet alongside the NLTK module to find the meanings of words, synonyms同義詞, antonyms反義詞, and more. Let's cover some examples.

First, you're going to need to import wordnet:

from nltk.corpus import wordnet

Then, we're going to use the term "program" to find synsets 同義詞集合like so:

syns = wordnet.synsets("program")

An example of a synset:

print(syns[0].name())

plan.n.01

Just the word: 只顯示單詞

print(syns[0].lemmas()[0].name())

plan

Definition of that first synset:

print(syns[0].definition())

a series of steps to be carried out or goals to be accomplished

Examples of the word in use:

print(syns[0].examples())

['they drew up a six-step plan', 'they discussed plans for a new bond issue']

Next, how might we discern synonyms and antonyms to a word? The lemmas will be synonyms, and then you can use .antonyms to find the antonyms to the lemmas. As such, we can populate some lists like:

synonyms = [] antonyms = [] for syn in wordnet.synsets("good"): for l in syn.lemmas(): synonyms.append(l.name()) if l.antonyms(): antonyms.append(l.antonyms()[0].name()) print(set(synonyms)) print(set(antonyms))
{'beneficial', 'just', 'upright', 'thoroughly', 'in_force', 'well', 'skilful', 'skillful', 'sound', 'unspoiled', 'expert', 'proficient', 'in_effect', 'honorable', 'adept', 'secure', 'commodity', 'estimable', 'soundly', 'right', 'respectable', 'good', 'serious', 'ripe', 'salutary', 'dear', 'practiced', 'goodness', 'safe', 'effective', 'unspoilt', 'dependable', 'undecomposed', 'honest', 'full', 'near', 'trade_good'} {'evil', 'evilness', 'bad', 'badness', 'ill'}

As you can see, we got many more synonyms than antonyms, since we just looked up the antonym for the first lemma, but you could easily balance this buy also doing the exact same process for the term "bad."

 

比較單詞近似度

Next, we can also easily use WordNet to compare the similarity of two words and their tenses, by incorporating the Wu and Palmer method for semantic related-ness.

Let's compare the noun of "ship" and "boat:"

w1 = wordnet.synset('ship.n.01') w2 = wordnet.synset('boat.n.01') print(w1.wup_similarity(w2))

0.9090909090909091

w1 = wordnet.synset('ship.n.01') w2 = wordnet.synset('car.n.01') print(w1.wup_similarity(w2))

0.6956521739130435

w1 = wordnet.synset('ship.n.01') w2 = wordnet.synset('cat.n.01') print(w1.wup_similarity(w2))

0.38095238095238093

Next, we're going to pick things up a bit and begin to cover the topic of Text Classification.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM