一步一步學lucene——（第二步：示例篇）

本文轉載自查看原文 2012-07-31 08:32 7601 java/ lucene/ 雲計算

在上面的篇幅中我們已經了解到了lucene，及lucene到底是做什么的，什么情況下才能夠使用lucene，下面我們就結合一個例子來詳細說明一下lucene的API，看看lucene是如何工作的。

lucene的下載

其實這個很簡單了，直接到baidu或者google上搜索一下，一般情況下第一個就是我們要的鏈接。下邊給出lucene下載的鏈接：

http://lucene.apache.org/

　　　　　　　　　　　　　　　　圖：lucene下載主頁面

配置環境

我們下面要做很多的測試，會建立很多的測試工程，如果一個一個手動的添加jar包會非常的麻煩，那么我們就需要配置eclipse環境。

打開eclipse，選擇windows->preferences->java->build path->user libraries

將我們上邊下載后的lucene中的包全部加載到這個用戶變量中。

　　　　　　　　　　　　　　　圖：eclipse中加入的用戶變量

建立索引

下面這個程序就是讀取指定文件夾下的文件並且將文件生成索引的過程，它有兩個參數，一個是要索引的文件路徑，一個是索引存放的路徑。

我們將文件放到我們硬盤的目錄上，然后通過程序建立索引。

索引程序如下：

View Code

 1 public class Indexer {
 2 
 3     public static void main(String[] args) throws Exception {
 4         if (args.length != 2) {
 5             throw new IllegalArgumentException("Usage: java "
 6                     + Indexer.class.getName() + " <index dir> <data dir>");
 7         }
 8         String indexDir = args[0]; // 1
 9         String dataDir = args[1]; // 2
10 
11         long start = System.currentTimeMillis();
12         Indexer indexer = new Indexer(indexDir);
13         int numIndexed;
14         try {
15             numIndexed = indexer.index(dataDir, new TextFilesFilter());
16         } finally {
17             indexer.close();
18         }
19         long end = System.currentTimeMillis();
20 
21         System.out.println("Indexing " + numIndexed + " files took "
22                 + (end - start) + " milliseconds");
23     }
24 
25     private IndexWriter writer;
26 
27     public Indexer(String indexDir) throws IOException {
28         Directory dir = FSDirectory.open(new File(indexDir));
29         writer = new IndexWriter(dir, // 3
30                 new StandardAnalyzer( // 3
31                         Version.LUCENE_30),// 3
32                 true, // 3
33                 IndexWriter.MaxFieldLength.UNLIMITED); // 3
34     }
35 
36     public void close() throws IOException {
37         writer.close(); // 4
38     }
39 
40     public int index(String dataDir, FileFilter filter) throws Exception {
41 
42         File[] files = new File(dataDir).listFiles();
43 
44         for (File f : files) {
45             if (!f.isDirectory() && !f.isHidden() && f.exists() && f.canRead()
46                     && (filter == null || filter.accept(f))) {
47                 indexFile(f);
48             }
49         }
50 
51         return writer.numDocs(); // 5
52     }
53 
54     private static class TextFilesFilter implements FileFilter {
55         public boolean accept(File path) {
56             return path.getName().toLowerCase() // 6
57                     .endsWith(".txt"); // 6
58         }
59     }
60 
61     protected Document getDocument(File f) throws Exception {
62         Document doc = new Document();
63         doc.add(new Field("contents", new FileReader(f))); // 7
64         doc.add(new Field("filename", f.getName(), // 8
65                 Field.Store.YES, Field.Index.NOT_ANALYZED));// 8
66         doc.add(new Field("fullpath", f.getCanonicalPath(), // 9
67                 Field.Store.YES, Field.Index.NOT_ANALYZED));// 9
68         return doc;
69     }
70 
71     private void indexFile(File f) throws Exception {
72         System.out.println("Indexing " + f.getCanonicalPath());
73         Document doc = getDocument(f);
74         writer.addDocument(doc); // 10
75     }
76 
77 }

然后在工程上點擊右鍵Run->Run configuration，新建一個Java Application，輸入兩個參數一個是索引目錄，一個是文件存放目錄

　　　　　　　　　　　　　　　　　　　　圖：配置運行界面

運行后可以行到分析結果，當然目錄中索引的內容不同得到的結果也就會不同。

　　　　　　　　圖：索引txt文件時輸出

根據索引查詢

因為這里邊還沒涉及到中文的部分，所以我們查詢所有文檔中包括"RUNNING"的文檔。

程序內容如下：

View Code

 1 public class Searcher {
 2 
 3     public static void main(String[] args) throws IllegalArgumentException,
 4             IOException, ParseException {
 5         if (args.length != 2) {
 6             throw new IllegalArgumentException("Usage: java "
 7                     + Searcher.class.getName() + " <index dir> <query>");
 8         }
 9 
10         String indexDir = args[0]; // 1
11         String q = args[1]; // 2
12 
13         search(indexDir, q);
14     }
15 
16     public static void search(String indexDir, String q) throws IOException,
17             ParseException {
18 
19         Directory dir = FSDirectory.open(new File(indexDir)); // 3
20         IndexSearcher is = new IndexSearcher(dir); // 3
21 
22         QueryParser parser = new QueryParser(Version.LUCENE_30, // 4
23                 "contents", // 4
24                 new StandardAnalyzer( // 4
25                         Version.LUCENE_30)); // 4
26         Query query = parser.parse(q); // 4
27         long start = System.currentTimeMillis();
28         TopDocs hits = is.search(query, 10); // 5
29         long end = System.currentTimeMillis();
30 
31         System.err.println("Found " + hits.totalHits + // 6
32                 " document(s) (in " + (end - start) + // 6
33                 " milliseconds) that matched query '" + // 6
34                 q + "':"); // 6
35 
36         for (ScoreDoc scoreDoc : hits.scoreDocs) {
37             Document doc = is.doc(scoreDoc.doc); // 7
38             System.out.println(doc.get("fullpath")); // 8
39         }
40 
41         is.close(); // 9
42     }
43 }