Eclipse下使用Stanford CoreNLP的方法

本文轉載自查看原文 2014-12-09 14:07 11721 NLP/ CoreNLP/ java/ Programming Language

源碼下載地址：CoreNLP官網。

目前release的CoreNLP version 3.5.0版本僅支持java-1.8及以上版本，因此有時需要為Eclipse添加jdk-1.8配置，配置方法如下：

首先，去oracle官網下載java-1.8，下載網址為：java下載，安裝完成后。
打開Eclipse，選擇Window -> Preferences -> Java –> Installed JREs 進行配置：
點擊窗體右邊的“add”，然后添加一個“Standard VM”（應該是標准虛擬機的意思），然后點擊“next”；
在”JRE HOME”那一行點擊右邊的“Directory…”找到你java 的安裝路徑，比如“C:Program Files/Java/jdk1.8”

這樣你的Eclipse就已經支持jdk-1.8了。

1. 新建java工程，注意編譯環境版本選擇1.8

2. 將官網下載的源碼解壓到工程下，並導入所需jar包

如導入stanford-corenlp-3.5.0.jar、stanford-corenlp-3.5.0-javadoc.jar、stanford-corenlp-3.5.0-models.jar、stanford-corenlp-3.5.0-sources.jar、xom.jar等

導入jar包過程為：項目右擊->Properties->Java Build Path->Libraries，點擊“Add JARs”，在路徑中選取相應的jar包即可。

3. 新建TestCoreNLP類，代碼如下

 1 package Test;
 2 
 3 import java.util.List;
 4 import java.util.Map;
 5 import java.util.Properties;
 6 
 7 import edu.stanford.nlp.dcoref.CorefChain;
 8 import edu.stanford.nlp.dcoref.CorefCoreAnnotations.CorefChainAnnotation;
 9 import edu.stanford.nlp.ling.CoreAnnotations.LemmaAnnotation;
10 import edu.stanford.nlp.ling.CoreAnnotations.NamedEntityTagAnnotation;
11 import edu.stanford.nlp.ling.CoreAnnotations.PartOfSpeechAnnotation;
12 import edu.stanford.nlp.ling.CoreAnnotations.SentencesAnnotation;
13 import edu.stanford.nlp.ling.CoreAnnotations.TextAnnotation;
14 import edu.stanford.nlp.ling.CoreAnnotations.TokensAnnotation;
15 import edu.stanford.nlp.ling.CoreLabel;
16 import edu.stanford.nlp.pipeline.Annotation;
17 import edu.stanford.nlp.pipeline.StanfordCoreNLP;
18 import edu.stanford.nlp.semgraph.SemanticGraph;
19 import edu.stanford.nlp.semgraph.SemanticGraphCoreAnnotations.CollapsedCCProcessedDependenciesAnnotation;
20 import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;
21 import edu.stanford.nlp.trees.Tree;
22 import edu.stanford.nlp.trees.TreeCoreAnnotations.TreeAnnotation;
23 import edu.stanford.nlp.util.CoreMap;
24 
25 public class TestCoreNLP {
26     public static void main(String[] args) {
27         // creates a StanfordCoreNLP object, with POS tagging, lemmatization, NER, parsing, and coreference resolution
28         Properties props = new Properties();
29         props.put("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");
30         StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
31         
32         // read some text in the text variable
33         String text = "Add your text here:Beijing sings Lenovo";
34         
35         // create an empty Annotation just with the given text
36         Annotation document = new Annotation(text);
37         
38         // run all Annotators on this text
39         pipeline.annotate(document);
40         
41         // these are all the sentences in this document
42         // a CoreMap is essentially a Map that uses class objects as keys and has values with custom types
43         List<CoreMap> sentences = document.get(SentencesAnnotation.class);
44         
45         System.out.println("word\tpos\tlemma\tner");
46         for(CoreMap sentence: sentences) {
47              // traversing the words in the current sentence
48              // a CoreLabel is a CoreMap with additional token-specific methods
49             for (CoreLabel token: sentence.get(TokensAnnotation.class)) {
50                 // this is the text of the token
51                 String word = token.get(TextAnnotation.class);
52                 // this is the POS tag of the token
53                 String pos = token.get(PartOfSpeechAnnotation.class);
54                 // this is the NER label of the token
55                 String ne = token.get(NamedEntityTagAnnotation.class);
56                 String lemma = token.get(LemmaAnnotation.class);
57                 
58                 System.out.println(word+"\t"+pos+"\t"+lemma+"\t"+ne);
59             }
60             // this is the parse tree of the current sentence
61             Tree tree = sentence.get(TreeAnnotation.class);
62             
63             // this is the Stanford dependency graph of the current sentence
64             SemanticGraph dependencies = sentence.get(CollapsedCCProcessedDependenciesAnnotation.class);
65         }
66         // This is the coreference link graph
67         // Each chain stores a set of mentions that link to each other,
68         // along with a method for getting the most representative mention
69         // Both sentence and token offsets start at 1!
70         Map<Integer, CorefChain> graph = document.get(CorefChainAnnotation.class);
71     }
72 }

PS：該代碼的思想是將text字符串交給Stanford CoreNLP處理，StanfordCoreNLP的各個組件（annotator）按“tokenize（分詞）, ssplit（斷句）, pos（詞性標注）, lemma（詞元化）, ner（命名實體識別）, parse（語法分析）, dcoref（同義詞分辨）”順序進行處理。

處理完后List<CoreMap> sentences = document.get(SentencesAnnotation.class);中包含了所有分析結果，遍歷即可獲知結果。

這里簡單的將單詞、詞性、詞元、是否實體打印出來。其余的用法參見官網（如sentiment、parse、relation等）。

4. 執行結果：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 stanford corenlp的TokensRegex stanford corenlp自定義切詞類 stanfordcorenlp安裝教程&問題匯總（importerror-no-module-named-psutil、OSError: stanford-chinese-corenlp-yyyy-MM-dd-models.jar not exists.）&簡單使用教程 Stanford CoreNLP 3.6.0 中文指代消解模塊調用失敗的解決方案用 Python 和 Stanford CoreNLP 進行中文自然語言處理 eclipse下使用cygwin的方法（Windows下用eclipse玩gcc/g++和gdb） Eclipse下使用Git 使用Stanford Parser進行句法分析 stanford-parser使用說明使用Standford coreNLP進行中文命名實體識別