stanford-parser使用說明

本文轉載自查看原文 2012-05-01 15:00 2941 stanford-parser使用說明

借助之前的分詞器，這里我們可以利用句法分析器來對中文語句進行更深的研究，下面我來簡單接受一下其調試

首先可以在已有的工程中導入，建立JAVA-Project

再對應的按照代碼的種變化進行調整，主要調整兩個地方：

1：new LexicalizedParser("grammar/chinesePCFG.ser.gz");

2： String[] sent = { "我", "是", "一名", "好", "學生", "。" };

因為是中文所以，選擇的包是中文的，還有就語句要是已經分好詞的句子，詳細代碼如下：

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.*;
import java.io.StringReader;

import edu.stanford.nlp.objectbank.TokenizerFactory;
import edu.stanford.nlp.process.CoreLabelTokenFactory;
import edu.stanford.nlp.process.DocumentPreprocessor;
import edu.stanford.nlp.process.PTBTokenizer;
import edu.stanford.nlp.ling.CoreLabel;  
import edu.stanford.nlp.ling.HasWord;  
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;

class ParserDemo {

  public static void main(String[] args) {
    LexicalizedParser lp = 
      new LexicalizedParser("grammar/chinesePCFG.ser.gz");
    if (args.length > 0) {
      demoDP(lp, args[0]);
    } else {
      demoAPI(lp);
    }
  }

  public static void demoDP(LexicalizedParser lp, String filename) {
    // This option shows loading and sentence-segment and tokenizing
    // a file using DocumentPreprocessor
    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    // You could also create a tokenier here (as below) and pass it
    // to DocumentPreprocessor
    for (List<HasWord> sentence : new DocumentPreprocessor(filename)) {
      Tree parse = lp.apply(sentence);
      parse.pennPrint();
      System.out.println();
      
      GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
      Collection tdl = gs.typedDependenciesCCprocessed(true);
      System.out.println(tdl);
      System.out.println();
    }
  }

  public static void demoAPI(LexicalizedParser lp) {
    // This option shows parsing a list of correctly tokenized words
    String[] sent = { "我", "是", "一名", "好", "學生", "。" };
    List<CoreLabel> rawWords = new ArrayList<CoreLabel>();
    for (String word : sent) {
      CoreLabel l = new CoreLabel();
      l.setWord(word);
      rawWords.add(l);
    }
    Tree parse = lp.apply(rawWords);
    parse.pennPrint();
    System.out.println();

    // This option shows loading and using an explicit tokenizer
    //String sent2 = "今天是個晴朗的天氣。";
    TokenizerFactory<CoreLabel> tokenizerFactory = 
      PTBTokenizer.factory(new CoreLabelTokenFactory(), "");
    //List<CoreLabel> rawWords2 = 
      //tokenizerFactory.getTokenizer(new StringReader(sent2)).tokenize();
    //parse = lp.apply(rawWords2);

    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
    List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
    System.out.println(tdl);
    System.out.println();

    TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
    tp.printTree(parse);
  }

  private ParserDemo() {} // static methods only

}

最后能編譯通過后，顯示結果如下，這里用的是LDC（賓夕法尼亞州的中文語料庫及其詞性標注和短語標注）

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 使用Stanford Parser進行句法分析 Shadows 使用說明 htmlcleaner 使用說明 VRTK使用說明 jgit使用說明 Podfile使用說明 MIPSsim使用說明 quicker使用說明 xtrabackup 使用說明 Kibana 使用說明