參考文章:
1. 安裝
1.1 安裝spaCy
$ sudo pip3 install spacy
- 注意:此處使用的是python3
- 執行此命令之前應先確保pip已安裝,可通過如下命令進行檢測是否安裝
pip3 --version
- 運行結果如下,則未安裝
- 執行sudo apt install python3-pip命令進行安裝
- 安裝spaCy的過程中要保證網絡暢通,否則會報錯
- 安裝成功如下所示
1.2 下載spaCy的數據集和模型
在線安裝行不通,只能離線安裝,english對應的統計模型共四個,可以選擇需要的進行離線下載
先只下載en_core_web_sm,下載對應的github地址為https://github.com/explosion/spacy-models/tags
如上圖,點擊Downloads,翻到新打開頁面的最下方,點擊第二個,即可下載en_core_web_sm-3.1.0.tar.gz
在下載位置的地方打開終端,執行命令
$ pip install en_core_web_sm-3.1.0.tar.gz
運行結果如下,安裝成功
可通過如下操作進行測試:
lmy@LMY-LAPTOP:~/NLP$ python3
Python 3.8.10 (default, Jun 2 2021, 10:49:15)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import spacy
>>> spacy.load("en_core_web_sm")
<spacy.lang.en.English object at 0x7f4bddc37670>
>>>
2. spaCy的簡單使用
2.1 查看活動的管道組件
>>> import spacy
>>> nlp = spacy.load("en_core_web_sm")
>>> doc = nlp("He went to play basketball")
>>> nlp.pipe_names
['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']
2.2 詞性標注
>>> for item in doc:
... print(item.text, "-->", item.pos_)
...
He --> PRON
went --> VERB
to --> PART
play --> VERB
basketball --> NOUN
>>> spacy.explain("PART")
'particle'
2.3 依存分析
>>> for item in doc:
... print(item.text, "-->", item.dep_)
...
He --> nsubj
went --> ROOT
to --> aux
play --> advcl
basketball --> dobj
>>> spacy.explain("nsubj"), spacy.explain("ROOT"), spacy.explain("aux"), spacy.explain("advcl"), spacy.explain("dobj")
('nominal subject', None, 'auxiliary', 'adverbial clause modifier', 'direct object')
2.4 基於spaCy的命名實體識別
>>> doc = nlp("Indians spent over $71 billion on clothes in 2018")
>>> for ent in doc.ents:
... print(ent.text, ent.label_)
...
Indians NORP
over $71 billion MONEY
2018 DATE
>>> spacy.explain("NORP")
'Nationalities or religious or political groups'