Stanford CoreNLP 3.6.0 中文指代消解模塊調用失敗的解決方案

本文轉載自查看原文 2015-12-28 10:05 5911

當前中文指代消解領域比較活躍的研究者是Chen和Vincent Ng，這兩個人近兩年在AAAI2014, 2015發了一些相關的文章，研究領域跨越零指代、代詞指代、名詞指代等，方法也不是很復雜，集中於規則+特征+模型的傳統思路。國內集中在蘇州大學周國棟老師帶領的團隊和劉挺、秦兵老師帶領的團隊，分別在Berkeley Parser、LTP基礎上做了一些研究，但是遺憾的是，近年來國內學者好像沒有頂會命中記錄。

鑒於當前國內的指代消解工具基本上沒有開源、同時效果還說得過去的，所以經過大量調研當前中文指代消解的現狀后，最終確定了使用Stanford CoreNLP作為實驗對象。

Stanford CoreNLP 是斯坦福NLP組開源的一套集分詞、詞性標注、命名實體識別、句法分析、情感分析、指代消解等NLP功能的軟件套裝，支持英文、中文等語言。

附這個婦孺皆知的tools的鏈接：http://nlp.stanford.edu/software/index.shtml 和 http://stanfordnlp.github.io/CoreNLP/index.html

它的官方Demo鏈接：http://nlp.stanford.edu:8080/corenlp/ (這個DEMO對應的后台，應該是使用的英文模型)

好了閑話不多說，我們快快步入正題，如何調用Stanford CoreNLP 3.6.0 套裝中的中文指代消解模塊

=========================================================================

1、下載 Stanford CoreNLP 3.6.0 源碼+模型，500M+，但是里面的cws、pos、parse等模型都是英文的 (http://stanfordnlp.github.io/CoreNLP/download.html)

2、下載中文模型，分詞、詞性標注、NER、parser等。(我不確定有沒有統一下載地址，我是一個一個點開，找到Chinese Model，下載的 http://nlp.stanford.edu/software/index.shtml)

3、跑測試代碼，在 http://stanfordnlp.github.io/CoreNLP/coref.html 里面找到運行方法，jar包調用分文件方式或者 java代碼調用分句子方式。(注意另外一個頁面 http://nlp.stanford.edu/software/dcoref.shtml 中的方法是錯誤的，實踐中有bug跑不通)

4、在3中找到了對的接口，實踐中jar包調用來處理文件的方法，是可以在中文語料上跑通的，但是3中貼的代碼仍然面向的是英文語料。這時需要對其進行修改。

代碼如下：

import edu.stanford.nlp.hcoref.CorefCoreAnnotations;
import edu.stanford.nlp.hcoref.data.CorefChain;
import edu.stanford.nlp.hcoref.data.Mention;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.Annotation;
import edu.stanford.nlp.pipeline.StanfordCoreNLP;
import edu.stanford.nlp.util.CoreMap;
import edu.stanford.nlp.util.StringUtils;

import java.util.Properties;

public class CorefExample {
    public static void main(String[] args) throws Exception {
        long startTime=System.currentTimeMillis();
        
        String text = "小明吃了個冰棒，它很甜。 ";
        args = new String[] {"-props", "edu/stanford/nlp/hcoref/properties/zh-coref-default.properties" };

        Annotation document = new Annotation(text);
        Properties props = StringUtils.argsToProperties(args);
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        pipeline.annotate(document);
        System.out.println("---");
        System.out.println("coref chains");
        for (CorefChain cc : document.get(CorefCoreAnnotations.CorefChainAnnotation.class).values()) {
            System.out.println("\t" + cc);
        }
        for (CoreMap sentence : document.get(CoreAnnotations.SentencesAnnotation.class)) {
            System.out.println("---");
            System.out.println("mentions");
            for (Mention m : sentence.get(CorefCoreAnnotations.CorefMentionsAnnotation.class)) {
                System.out.println("\t" + m);
            }
        }
        
        long endTime=System.currentTimeMillis(); 
        long time = (endTime-startTime)/1000;
        System.out.println("Running time "+time/60+"min "+time%60+"s");
    }
}

那么里面的zh-coref-default.properties 為啥CoreNLP里面沒有呢。。。最后在stanford-chinese-corenlp-2015-12-08-models.jar解壓后對應目錄下找到了這個文件，與它官方網頁里面雖然只相差一行(具體哪一行，大家可以對比看看)，但是沒有那個屬性，真的跑不通。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 指代消解開源中文分詞工具探析（六）：Stanford CoreNLP NLP（十二）指代消解 stanford corenlp的TokensRegex 【轉】基於VSM的命名實體識別、歧義消解和指代消解 Stanford CoreNLP--功能列表 Stanford CoreNLP--Split Sentence Stanford CoreNLP在linux系統中安裝與使用用 Python 和 Stanford CoreNLP 進行中文自然語言處理 tcpdf中文解決方案