下載鏈接 戳這里
下載后的文件夾是這樣的:

然后打開eclipse,新建項目,把源文件segDemo.java拷貝進去,把jar包全丟進去(右鍵項目, properties,Java Build Path,Add External Jars)
導入data數據包,並且修改源碼中的路徑,如圖所示:

然后修改segDemo.java並且測試
1 package test; 2 import java.io.*; 3 import java.util.List; 4 import java.util.Properties; 5 6 import edu.stanford.nlp.ie.crf.CRFClassifier; 7 import edu.stanford.nlp.ling.CoreLabel; 8 9 10 /** This is a very simple demo of calling the Chinese Word Segmenter 11 * programmatically. It assumes an input file in UTF8. 12 * <p/> 13 * <code> 14 * Usage: java -mx1g -cp seg.jar SegDemo fileName 15 * </code> 16 * This will run correctly in the distribution home directory. To 17 * run in general, the properties for where to find dictionaries or 18 * normalizations have to be set. 19 * 20 * @author Christopher Manning 21 */ 22 23 public class SegDemo { 24 25 private static final String basedir = System.getProperty("SegDemo", "data"); 26 27 public static void main(String[] args) throws Exception { 28 System.setOut(new PrintStream(System.out, true, "utf-8")); 29 30 Properties props = new Properties(); 31 props.setProperty("sighanCorporaDict", basedir); 32 // props.setProperty("NormalizationTable", "data/norm.simp.utf8"); 33 // props.setProperty("normTableEncoding", "UTF-8"); 34 // below is needed because CTBSegDocumentIteratorFactory accesses it 35 props.setProperty("serDictionary", basedir + "/dict-chris6.ser.gz"); 36 if (args.length > 0) { 37 props.setProperty("testFile", args[0]); 38 } 39 props.setProperty("inputEncoding", "UTF-8"); 40 props.setProperty("sighanPostProcessing", "true"); 41 42 CRFClassifier<CoreLabel> segmenter = new CRFClassifier<>(props); 43 segmenter.loadClassifierNoExceptions(basedir + "/ctb.gz", props); 44 for (String filename : args) { 45 segmenter.classifyAndWriteAnswers(filename); 46 } 47 48 String sample = "我住在美國。"; 49 List<String> segmented = segmenter.segmentString(sample); 50 System.out.println(segmented); 51 } 52 53 }
輸出:[我, 住在, 美國, 。]
之后請隨意發揮吧~
