Python 自然語言處理筆記(一)

本文轉載自查看原文 2017-02-14 16:10 7210 _future_/ Concordance/ Dispersion_plot/ Similar/ Common_contexts/ generate

一． NLTK的幾個常用函數

1. Concordance

　　實例如下：

>>> text1.concordance("monstrous")
Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us ,
ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u
>>>

　這個函數就是用來搜索單詞word在text 中出現多的情況,包括出現的那一行,重點強調上下文。從輸出來看 concordance 將要查詢的單詞,基本顯示在一列,這樣容易觀察其上下文.

2. Similar

　　實例：

>>> text1.similar("monstrous")
modifies horrible singular mouldy contemptible determined tyrannical
candid wise lamentable pitiable fearless loving maddens domineering
careful true mystifying part passing
>>>

這個函數的作用則是根據word 的上下文的單詞的情況,來查找具有相似的上下文的單詞. 比如monstrous 在上面可以看到,有這樣的用法:

most monstrous size
the monstrous pictures
this monstrous cabinet

等等, similar() 函數會在文本中搜索具有類似結構的其他單詞, 不過貌似這個函數只會考慮一些簡單的指標,來作為相似度,比如上下文的詞性,更多的完整匹配, 不會涉及到語義.

3. Common_contexts

　　實例：

>>> text1.common_contexts(["monstrous", "very"])
No common contexts were found
>>> text2.common_contexts(["monstrous", "very"])
a_pretty a_lucky am_glad be_glad is_pretty
>>>

這個函數跟simailar() 有點類似,也是在根據上下文搜索的.
不同的是,這個函數是用來搜索共用參數中的列表中的所有單詞,的上下文.即: word1,word2 相同的上下文.

4. Dispersion_plot

　　實例：

>>> text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "Americ
a"])

這個函數是用離散圖表示語料中word 出現的位置序列表示. 效果如下：

其中橫坐標表示文本的單詞位置.縱坐標表示查詢的單詞, 坐標里面的就是,單詞出現的位置.就是單詞的分布情況。

5. generate

　　實例：

>>> text3.generate()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: generate() missing 1 required positional argument: 'words'
>>>

產生一些與text3風格類似的隨機文本。但在本機上卻出錯，原因是我使用的是nltk3.2.4和Python3.4.4，該版本下generate函數被注釋了，所以無法使用。而《python自然語言處理時》書中用的是NLTK2.0版本。

6. _future_模塊

　　_future_模塊使得在Python2.x的版本下能夠兼容更多的Python3.x的特性。把下一個新版本的特性導入到當前版本，於是我們就可以在當前版本中測試一些新版本的特性。所以Python3.x以后的版本中都不含有該模塊。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python 自然語言處理（五）____WordNet Python自然語言處理-系列一 python自然語言處理（一）自然語言處理(1)之NLTK與PYTHON 自然語言處理入門小白從0開始學自然語言處理+學習筆記（一）自然語言處理1——語言處理與Python（內含糾錯）《TensorFlow與自然語言處理應用》PDF代碼+雅蘭《Python自然語言處理》PDF中英文代碼+《基於深度學習的自然語言處理》中文PDF筆記 Python自然語言處理筆記【一】文本分類之監督式分類 Python自然語言處理學習筆記(64)： 7.5 命名實體識別 Python自然語言處理工具小結