python+NLTK 自然語言學習處理：環境搭建

本文轉載自查看原文 2017-06-25 10:40 4381 python+NLTK 自然語言學習

首先在http://nltk.org/install.html去下載相關的程序。需要用到的有python,numpy,pandas, matplotlib. 當安裝好所有的程序之后運行nltk.download()進行詞料庫的下載。如下圖。選擇All packages。然后點擊下載

這里需要注意的是Download Directory 可以自行修改。但是最后的一級目錄必須是nltk_data

比如可以修改成D:\nltk_data

這個下載器下載很慢，經常會遇到下載不了的時候。這個時候有兩種方法可以選擇：

1 直接到 http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml 去下載對應的包

2第二種方法：網上也有其他人打包的庫：比如下面的這個鏈接就可以下載

https://d11.baidupcs.com/file/b8adca61f3d951733a1508c538fb139f?bkt=p3-1400b8adca61f3d951733a1508c538fb139f7a5a378700001237cfb6&xcode=24ee57e4c00df669f8114f90862e7a576f1a5fd0dfa92cd70b2977702d3e6764&fid=655353904-250528-168229026483879&time=1498354932&sign=FDTAXGERLBHS-DCb740ccc5511e5e8fedcff06b081203-farXKS5Ut9qIEKMP6uCJBn0sFLk%3D&to=d11&size=305647542&sta_dx=305647542&sta_cs=1637&sta_ft=zip&sta_ct=7&sta_mt=7&fm2=MH,Ningbo,Netizen-anywhere,,sichuan,ct&newver=1&newfm=1&secfm=1&flow_ver=3&pkey=1400b8adca61f3d951733a1508c538fb139f7a5a378700001237cfb6&sl=83034191&expires=8h&rt=sh&r=640794177&mlogid=4068121183592230425&vuk=1681792858&vbdid=634719214&fin=nltk_data.zip&fn=nltk_data.zip&rtype=1&iv=0&dp-logid=4068121183592230425&dp-callid=0.1.1&hps=1&csl=300&csign=YEkhhUZEK82GGRxxvymOo9t9Y2E%3D&by=themis

這里需要注意的是自行下載的包必須要放在nltk_data文件夾里面。否則導入的時候會出現失敗：比如我下載到NLTK的文件夾里面，在導入的時候報如下錯誤。系統

>>> from nltk.book import *

*** Introductory Examples for the NLTK Book ***

Loading text1, ..., text9 and sent1, ..., sent9

Type the name of the text or sentence to view it.

Type: 'texts()' or 'sents()' to list the materials.

Traceback (most recent call last):

File "<pyshell#0>", line 1, in <module>

from nltk.book import *

File "E:\python2.7.11\lib\site-packages\nltk-3.2.4-py2.7.egg\nltk\book.py", line 20, in <module>

text1 = Text(gutenberg.words('melville-moby_dick.txt'))

File "E:\python2.7.11\lib\site-packages\nltk-3.2.4-py2.7.egg\nltk\corpus\util.py", line 116, in __getattr__

self.__load()

File "E:\python2.7.11\lib\site-packages\nltk-3.2.4-py2.7.egg\nltk\corpus\util.py", line 81, in __load

except LookupError: raise e

LookupError:

**********************************************************************

Resource u'corpora/gutenberg' not found. Please use the NLTK

Downloader to obtain the resource: >>> nltk.download()

Searched in:

- 'C:\\Users\\Administrator/nltk_data'

- 'C:\\nltk_data'

- 'D:\\nltk_data'

- 'E:\\nltk_data'

- 'E:\\python2.7.11\\nltk_data'

- 'E:\\python2.7.11\\lib\\nltk_data'

- 'C:\\Users\\Administrator\\AppData\\Roaming\\nltk_data'

系統在下面的幾個路徑去找，由於沒有nltk_data的文件夾，所以找不到相關的文件

- 'C:\\Users\\Administrator/nltk_data'

- 'C:\\nltk_data'

- 'D:\\nltk_data'

- 'E:\\nltk_data'

- 'E:\\python2.7.11\\nltk_data'

- 'E:\\python2.7.11\\lib\\nltk_data'

- 'C:\\Users\\Administrator\\AppData\\Roaming\\nltk_data'

將文件目錄名改成如下后就可以了

而在linux環境下，搜索的路徑為如下，我們需要將nltk的數據放置在如下的目錄中

Searched in:
    - '/root/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - '/usr/nltk_data'
    - '/usr/lib/nltk_data'
我是放在了/usr目錄下面

我們再導入就成功了

>>> from nltk.book import *

*** Introductory Examples for the NLTK Book ***

Loading text1, ..., text9 and sent1, ..., sent9

Type the name of the text or sentence to view it.

Type: 'texts()' or 'sents()' to list the materials.

text1: Moby Dick by Herman Melville 1851

text2: Sense and Sensibility by Jane Austen 1811

text3: The Book of Genesis

text4: Inaugural Address Corpus

text5: Chat Corpus

text6: Monty Python and the Holy Grail

text7: Wall Street Journal

text8: Personals Corpus

text9: The Man Who Was Thursday by G . K . Chesterton 1908

我們來測試一把：下面這個命令的意義在於在text1文本中查找monstrous出現的地方

>>> text1.concordance('monstrous')

Displaying 11 of 11 matches:

ong the former , one was of a most monstrous size . ... This came towards us ,

ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r

ll over with a heathenish array of monstrous clubs and spears . Some were thick

d as you gazed , and wondered what monstrous cannibal and savage could ever hav

that has survived the flood ; most monstrous and most mountainous ! That Himmal

they might scout at Moby Dick as a monstrous fable , or still worse and more de

th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l

ing Scenes . In connexion with the monstrous pictures of whales , I am strongly

ere to enter upon those still more monstrous stories of them which are to be fo

ght have been rummaged out of this monstrous cabinet there is no telling . But

of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u

環境已經搭建好了，后面就開始正式的NLTK學習了

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python+NLTK 自然語言學習處理二：文本 python+NLTK 自然語言學習處理六：分類和標注詞匯一 python+NLTK 自然語言學習處理八：分類文本一自然語言處理(1)之NLTK與PYTHON 利用NLTK在Python下進行自然語言處理 nltk RegexpTokenizer類:python自然語言處理自然語言處理NLTK之入門 NLTK自然語言處理庫 NLTK與自然語言處理基礎 Python NLTK 自然語言處理入門與例程(轉)