斯坦福分詞工具的試用

本文轉載自查看原文 2016-04-26 16:26 1770

下載后的文件夾是這樣的：

然后打開eclipse，新建項目，把源文件segDemo.java拷貝進去，把jar包全丟進去（右鍵項目， properties，Java Build Path，Add External Jars）

導入data數據包，並且修改源碼中的路徑，如圖所示：

然后修改segDemo.java並且測試

 1 package test;
 2 import java.io.*;
 3 import java.util.List;
 4 import java.util.Properties;
 5 
 6 import edu.stanford.nlp.ie.crf.CRFClassifier;
 7 import edu.stanford.nlp.ling.CoreLabel;
 8 
 9 
10 /** This is a very simple demo of calling the Chinese Word Segmenter
11  *  programmatically.  It assumes an input file in UTF8.
12  *  <p/>
13  *  <code>
14  *  Usage: java -mx1g -cp seg.jar SegDemo fileName
15  *  </code>
16  *  This will run correctly in the distribution home directory.  To
17  *  run in general, the properties for where to find dictionaries or
18  *  normalizations have to be set.
19  *
20  *  @author Christopher Manning
21  */
22 
23 public class SegDemo {
24 
25   private static final String basedir = System.getProperty("SegDemo", "data");
26 
27   public static void main(String[] args) throws Exception {
28     System.setOut(new PrintStream(System.out, true, "utf-8"));
29 
30     Properties props = new Properties();
31     props.setProperty("sighanCorporaDict", basedir);
32     // props.setProperty("NormalizationTable", "data/norm.simp.utf8");
33     // props.setProperty("normTableEncoding", "UTF-8");
34     // below is needed because CTBSegDocumentIteratorFactory accesses it
35     props.setProperty("serDictionary", basedir + "/dict-chris6.ser.gz");
36     if (args.length > 0) {
37       props.setProperty("testFile", args[0]);
38     }
39     props.setProperty("inputEncoding", "UTF-8");
40     props.setProperty("sighanPostProcessing", "true");
41 
42     CRFClassifier<CoreLabel> segmenter = new CRFClassifier<>(props);
43     segmenter.loadClassifierNoExceptions(basedir + "/ctb.gz", props);
44     for (String filename : args) {
45       segmenter.classifyAndWriteAnswers(filename);
46     }
47 
48     String sample = "我住在美國。";
49     List<String> segmented = segmenter.segmentString(sample);
50     System.out.println(segmented);
51   }
52 
53 }

輸出：[我, 住在, 美國, 。]

之后請隨意發揮吧~

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 中文分詞工具(LAC) 試用筆記《斯坦福高效睡眠法》 [OpenGL] 斯坦福兔子與顯示列表機器狗斯坦福開源項目斯坦福大學機器學習筆記及代碼（一）斯坦福大學sql練習（基礎篇）斯坦福CS224n課程作業【斯坦福算法分析和設計02】漸進分析斯坦福算法設計和分析_3. 分治算法斯坦福Pi幣，手機免費挖礦。