python 自然語言處理(五)____WordNet


WordNet是面向語義的英語詞典,與傳統辭典類似,但結構更豐富。nltk中包括英語WordNet,共有155287個單詞和117659個同義詞。

1.尋找同義詞

這里以motorcar為例,尋找它的同義詞集。

1 >>> from nltk.corpus import wordnet as wn
2 >>> wn.synsets('motorcar')                                //找到同義詞集
3 [Synset('car.n.01')]
4 >>> wn.synset('car.n.01').lemma_names
5 <bound method Synset.lemma_names of Synset('car.n.01')>
6 >>> wn.synset('car.n.01').lemma_names()                   //訪問同義詞集
7 ['car', 'auto', 'automobile', 'machine', 'motorcar']
8 >>>
 1 >>> wn.synset('car.n.01').definition()              //獲取該詞在該詞集的定義
 2 'a motor vehicle with four wheels; usually propelled by an internal combustion engine'
 3 >>> wn.synset('car.n.01').examples()            //獲取該詞在該詞集下的例句
 4 ['he needs a car to get to work']
 5 >>> wn.synset('car.n.01').lemmas()
 6 [Lemma('car.n.01.car'), Lemma('car.n.01.auto'), Lemma('car.n.01.automobile'), Lemma('car.n.01.machine'), Lemma('car.n.01.motorcar')]
 7 >>> wn.lemma('car.n.01.automobile')
 8 Lemma('car.n.01.automobile')
 9 >>> wn.lemma('car.n.01.automobile').synset()
10 Synset('car.n.01')
11 >>> wn.lemma('car.n.01.automobile').name()
12 'automobile'
13 >>> wn.synsets('car')
14 [Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')]
15 >>> for synset in wn.synsets('car'):
16 ...     print (synset.lemma_names())
17 ...
18 ['car', 'auto', 'automobile', 'machine', 'motorcar']
19 ['car', 'railcar', 'railway_car', 'railroad_car']
20 ['car', 'gondola']
21 ['car', 'elevator_car']
22 ['cable_car', 'car']
23 >>> wn.lemmas('car')                          //訪問所有包含詞car的詞條
24 [Lemma('car.n.01.car'), Lemma('car.n.02.car'), Lemma('car.n.03.car'), Lemma('car.n.04.car'), Lemma('cable_car.n.01.car')]
25 >>>
View Code

2.WordNet的層次結構

WordNet的同義詞集相當於抽象的概念,它們並不總是有對應的英語詞匯。這些概念在層次結構中相互聯系在一起。

如上圖,是WordNet概念的層次片段。每個節點對應一個同義詞集,邊表示上位詞/下位詞關系,即上級概念與從屬概念的關系。

 1 >>> motorcar=wn.synset('car.n.01')
 2 >>> types_of_motorcar=motorcar.hyponyms()
 3 >>> types_of_motorcar[26]
 4 Synset('stanley_steamer.n.01')
 5 >>> sorted(
 6 ... [lemma.name()
 7 ... for synset in types_of_motorcar
 8 ... for lemma in synset.lemmas()])
 9 ['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer', 'ambulance', 'beach_waggon', 'beach_wagon', 'bus', 'cab', 'compact', 'compact_car', 'convert
10 ible', 'coupe', 'cruiser', 'electric', 'electric_automobile', 'electric_car', 'estate_car', 'gas_guzzler', 'hack', 'hardtop', 'hatchback', '
11 heap', 'horseless_carriage', 'hot-rod', 'hot_rod', 'jalopy', 'jeep', 'landrover', 'limo', 'limousine', 'loaner', 'minicar', 'minivan', 'pace
12 _car', 'patrol_car', 'phaeton', 'police_car', 'police_cruiser', 'prowl_car', 'race_car', 'racer', 'racing_car', 'roadster', 'runabout', 'sal
13 oon', 'secondhand_car', 'sedan', 'sport_car', 'sport_utility', 'sport_utility_vehicle', 'sports_car', 'squad_car', 'station_waggon', 'statio
14 n_wagon', 'stock_car', 'subcompact', 'subcompact_car', 'taxi', 'taxicab', 'tourer', 'touring_car', 'two-seater', 'used-car', 'waggon', 'wago
15 n']
16 >>> motorcar.hypernyms()
17 [Synset('motor_vehicle.n.01')]
18 >>> paths=motorcar.hypernym_paths()
19 >>> len(paths)
20 2
21 >>> [synset.name for synset in paths[0]]
22 [<bound method Synset.name of Synset('entity.n.01')>, <bound method Synset.name of Synset('physical_entity.n.01')>, <bound method Synset.nam
23 e of Synset('object.n.01')>, <bound method Synset.name of Synset('whole.n.02')>, <bound method Synset.name of Synset('artifact.n.01')>, <bou
24 nd method Synset.name of Synset('instrumentality.n.03')>, <bound method Synset.name of Synset('container.n.01')>, <bound method Synset.name
25 of Synset('wheeled_vehicle.n.01')>, <bound method Synset.name of Synset('self-propelled_vehicle.n.01')>, <bound method Synset.name of Synset
26 ('motor_vehicle.n.01')>, <bound method Synset.name of Synset('car.n.01')>]
27 >>> [synset.name() for synset in paths[0]]
28 ['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01', 'instrumentality.n.03', 'container.n.01', 'wheeled_veh
29 icle.n.01', 'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01']
30 >>> [synset.name() for synset in paths[1]]
31 ['entity.n.01', 'physical_entity.n.01', 'object.n.01', 'whole.n.02', 'artifact.n.01', 'instrumentality.n.03', 'conveyance.n.03', 'vehicle.n.
32 01', 'wheeled_vehicle.n.01', 'self-propelled_vehicle.n.01', 'motor_vehicle.n.01', 'car.n.01']
33 >>> motorcar.root_hypernyms()
34 [Synset('entity.n.01')]
35 >>>
View Code

3.更多的詞匯關系

上位詞和下位詞被稱為詞匯關系,因為它們是同義集之間的關系。這兩者的關系為上下定位“is-a”層次。WordNet網絡另一個重要的定位方式是從條目到它們的部件(部分)或到包含它們的東西(整體)。

1)部分-整體關系

 1 >>> wn.synset('tree.n.01').part_meronyms()
 2 [Synset('burl.n.02'), Synset('crown.n.07'), Synset('limb.n.02'), Synset('stump.n.01'), Synset('trunk.n.01')]
 3 >>> wn.synset('tree.n.01').substance_meronyms()
 4 [Synset('heartwood.n.01'), Synset('sapwood.n.01')]
 5 >>> wn.synset('tree.n.01').member_holonyms()
 6 [Synset('forest.n.01')]
 7 >>> for synset in wn.synsets('mint', wn.NOUN):
 8 ...     print("%s : %s" % (synset.name(), synset.definition())
 9 ...
10 ...
11 ... )
12 ...
13 batch.n.02 : (often followed by `of') a large number or amount or extent
14 mint.n.02 : any north temperate plant of the genus Mentha with aromatic leaves and small mauve flowers
15 mint.n.03 : any member of the mint family of plants
16 mint.n.04 : the leaves of a mint plant used fresh or candied
17 mint.n.05 : a candy that is flavored with a mint oil
18 mint.n.06 : a plant where money is coined by authority of the government
19 >>> wn.synset('mint.n.04').part_holonyms()
20 [Synset('mint.n.02')]
21 >>> wn.synset('mint.n.04').substance_holonyms()
22 [Synset('mint.n.05')]

2)蘊涵關系

1 >>> wn.synset('walk.v.01').entailments()
2 [Synset('step.v.01')]
3 >>> wn.synset('eat.v.01').entailments()
4 [Synset('chew.v.01'), Synset('swallow.v.01')]
5 >>> wn.synset('tease.v.03').entailments()
6 [Synset('arouse.v.07'), Synset('disappoint.v.01')]

3)反義詞

1 >>> wn.lemma('supply.n.02.supply').antonyms()
2 [Lemma('demand.n.02.demand')]
3 >>> wn.lemma('rush.v.01.rush').antonyms()
4 [Lemma('linger.v.04.linger')]
5 >>> wn.lemma('horizontal.a.01.horizontal').antonyms()
6 [Lemma('inclined.a.02.inclined'), Lemma('vertical.a.01.vertical')]
7 >>> wn.lemma('staccato.r.01.staccato').antonyms()
8 [Lemma('legato.r.01.legato')]
9 >>>

4. 語義相似度

同義詞集是由復雜的詞匯關系網絡所連接起來的。給定一個同義詞集,可以遍歷WordNet網絡來查找相關含義的同義詞集。每個同義詞集都有一個或多個上位詞路徑連接到一個根上位詞。連接到同一個根的兩個同義詞集可能有一些共同的上位詞。如果兩個同義詞集共用一個特定的上位詞——在上位詞層次結構中處於較底層——它們一定有密切的聯系。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM