3.2 Lucene實戰：一個簡單的小程序

本文轉載自查看原文 2017-05-05 08:21 2003 lucene學習

在講解Lucene索引和檢索的原理之前，我們先來實戰Lucene:一個簡單的小程序！

一、索引小程序

首先，new一個java project，名字叫做LuceneIndex。

然后，在project里new一個class，名字叫做Indexer。這個類用來給文件建索引（建好索引以后就可以高效檢索了）。

在寫代碼之前，我們要先引入一下lucene包。分為三步：

1. 創建lib文件夾。

2. 將所需要的lucene包復制到lib文件夾中。

3. Build path-> lib->Configure Build Path->Add JARS，選擇LuceneIndex工程下lib文件夾下的所有jar包並添加到路徑，連續點擊兩次ok。

步驟2 步驟3

在這些准備工作完成后我們就可以開始寫代碼了。

1. 首先在LuceneIndex里new一個class,名字叫做Indexer。

2. 然后，在LuceneIndex工程里新建一個文件夾，叫做raw。

3. 接下來，在raw文件夾里新建兩個utf-8編碼的txt文件。比如第一個文件命名為hello.txt，內容為"Hello",第二個文件命名為nihao.txt,內容為"你好"。這里要注意的是，上面的代碼是針對中文搜索的問題使用了utf-8編碼，所以要求文件也是utf-8的編碼。如圖：

4. 寫入如下代碼：

import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.io.*;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

/**
 * @author csl
 * @description: 
 * 依賴jar：Lucene-core，lucene-analyzers-common，lucene-queryparser
 * 作用：簡單的索引建立
 */
public class Indexer {
    public static Version luceneVersion = Version.LATEST;
    /**
     * 建立索引
     */
    public static void createIndex(){
        IndexWriter writer = null;
        try{
            //1、創建Directory
            //Directory directory = new RAMDirectory();//創建內存directory
            Directory directory = FSDirectory.open(Paths.get("index"));//在硬盤上生成Directory00
            //2、創建IndexWriter
            IndexWriterConfig iwConfig = new IndexWriterConfig( new StandardAnalyzer());
            writer = new IndexWriter(directory, iwConfig);
            //3、創建document對象
            Document document = null;
            //4、為document添加field對象
            File f = new File("raw");//索引源文件位置
            for (File file:f.listFiles()){
                    document = new Document();
                    document.add(new StringField("path", f.getName(),Field.Store.YES));
                    System.out.println(file.getName());
                    document.add(new StringField("name", file.getName(),Field.Store.YES));
                    InputStream stream = Files.newInputStream(Paths.get(file.toString()));
                    document.add(new TextField("content", new BufferedReader(new InputStreamReader(stream, StandardCharsets.UTF_8))));//textField內容會進行分詞
                    //document.add(new TextField("content", new FileReader(file)));  如果不用utf-8編碼的話直接用這個就可以了
                    writer.addDocument(document);
            }
        }catch(Exception e){
            e.printStackTrace();
        }finally{
            //6、使用完成后需要將writer進行關閉
            try {
                writer.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    public static void main(String[] args) throws IOException
    {
        createIndex();
    }
}

View Code

5. 最后，運行Indexer.java。會看到索引建立完成。LuceneIndex工程下多了一個index文件夾。

二、檢索小程序

下面我們就要用這個index來檢索了。

1. new一個class，命名為Searcher。然后在里面寫入如下代碼：

 1 import java.nio.file.Paths;
 2 import java.io.*;
 3 
 4 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 5 import org.apache.lucene.document.Document;
 6 import org.apache.lucene.index.DirectoryReader;
 7 import org.apache.lucene.queryparser.classic.QueryParser;
 8 import org.apache.lucene.search.IndexSearcher;
 9 import org.apache.lucene.search.Query;
10 import org.apache.lucene.search.ScoreDoc;
11 import org.apache.lucene.search.TopDocs;
12 import org.apache.lucene.store.Directory;
13 import org.apache.lucene.store.FSDirectory;
14 import org.apache.lucene.util.Version;
15 
16 /**
17  * @author csl
18  * @description: 
19  * 依賴jar：Lucene-core，lucene-analyzers-common，lucene-queryparser
20  * 作用：使用索引搜索文件
21  */
22 public class Searcher {
23     public static Version luceneVersion = Version.LATEST;
24     /**
25      * 查詢內容
26      */
27     public static String indexSearch(String keywords){
28         String res = "";
29         DirectoryReader reader = null;
30         try{
31 //            1、創建Directory
32              Directory directory = FSDirectory.open(Paths.get("index"));//在硬盤上生成Directory
33 //            2、創建IndexReader
34              reader = DirectoryReader.open(directory);
35 //            3、根據IndexWriter創建IndexSearcher
36              IndexSearcher searcher =  new IndexSearcher(reader);
37 //            4、創建搜索的query
38 //            創建parse用來確定搜索的內容，第二個參數表示搜索的域
39              QueryParser parser = new QueryParser("content",new StandardAnalyzer());//content表示搜索的域或者說字段
40              Query query = parser.parse(keywords);//被搜索的內容
41 //            5、根據Searcher返回TopDocs
42              TopDocs tds = searcher.search(query, 20);//查詢20條記錄
43 //            6、根據TopDocs獲取ScoreDoc
44              ScoreDoc[] sds = tds.scoreDocs;
45 //            7、根據Searcher和ScoreDoc獲取搜索到的document對象
46              int cou=0;
47              for(ScoreDoc sd:sds){
48                  cou++;
49                  Document d = searcher.doc(sd.doc);
50 //                    8、根據document對象獲取查詢的字段值
51                  /**  查詢結果中content為空，是因為索引中沒有存儲content的內容，需要根據索引path和name從原文件中獲取content**/
52                  res+=cou+". "+d.get("path")+" "+d.get("name")+" "+d.get("content")+"\n";
53              }
54 
55             
56         }catch(Exception e){
57             e.printStackTrace();
58         }finally{
59             //9、關閉reader
60             try {
61                 reader.close();
62             } catch (IOException e) {
63                 e.printStackTrace();
64             }
65         }
66         return res;
67     }
68     public static void main(String[] args) throws IOException
69     {
70         System.out.println(indexSearch("你好")); //搜索的內容可以修改
71     }
72 }

View Code

1 public static void main(String[] args) throws IOException
2     {
3         System.out.println(indexSearch("你好")); //搜索的內容可以修改
4     }

View Code

搜索內容為"你好"時，搜索結果為內容包含"你好"的nihao.txt

1 public static void main(String[] args) throws IOException
2     {
3         System.out.println(indexSearch("Hello")); //搜索的內容可以修改
4     }

View Code

搜索內容為"Hello"時，搜索結果為內容包含"Hello"的hello.txt

至此，我們已經進行了Lucene實戰，學會了簡單的建立索引和檢索了！

三、工程源碼

可以先試一下我的項目：https://github.com/shelly-github/my_simple_Lucenetest

1. 下載這個工程

2. 然后解壓

3. 導入Eclipse

首先，打開Eclipse,選定一個workspace。

然后，點擊File->import->Existing Projects into workspace

雙擊Existing Projects into workspace,選擇工程所在目錄

點擊確定->finish,完成工程的導入。

接下來就可以運行程序了，注意這個工程里沒有包含Index，你需要先運行Indexer建立索引，然后再用Searcher進行檢索。

四、遍歷文件系統

這是一個簡單的Lucene演示程序，只能索引同一目錄下的txt文件，下面我來介紹一種遍歷文件系統並且索引.txt文件的方法。

這個方法很簡單，就是一個遞歸實現的深度優先遍歷。

 1 import java.nio.charset.StandardCharsets;
 2 import java.nio.file.Files;
 3 import java.nio.file.Paths;
 4 import java.io.*;
 5 
 6 import org.apache.lucene.analysis.standard.StandardAnalyzer;
 7 import org.apache.lucene.document.Document;
 8 import org.apache.lucene.document.Field;
 9 import org.apache.lucene.document.StringField;
10 import org.apache.lucene.document.TextField;
11 import org.apache.lucene.index.IndexWriter;
12 import org.apache.lucene.index.IndexWriterConfig;
13 import org.apache.lucene.store.Directory;
14 import org.apache.lucene.store.FSDirectory;
15 import org.apache.lucene.util.Version;
16 public class Indexer {
17     static int numIndexed=0;
18     //索引
19     private static void indexFile(IndexWriter writer,File f) throws IOException
20     {
21         if(f.isHidden()||!f.exists()||!f.canRead())
22         {
23             return;
24         }
25         System.out.println("Indexing"+f.getCanonicalPath());
26         Document document = new Document();
27         document.add(new StringField("path", f.getName(),Field.Store.YES));
28         System.out.println(f.getName());
29         document.add(new StringField("name", f.getName(),Field.Store.YES));
30         InputStream stream = Files.newInputStream(Paths.get(f.toString()));
31         document.add(new TextField("content", new BufferedReader(new InputStreamReader(stream, StandardCharsets.UTF_8))));//textField內容會進行分詞
32         //document.add(new TextField("content", new FileReader(file)));  如果不用utf-8編碼的話直接用這個就可以了
33         writer.addDocument(document);
34     }
35     //深度優先遍歷文件系統並索引.txt文件
36     private static int indexDirectory(IndexWriter writer,File dir) throws IOException
37     {
38         
39         File[] files=dir.listFiles();
40         for(int i=0;i<files.length;i++)
41         {
42             File f=files[i];
43             System.out.println(f.getAbsolutePath());
44             if(f.isDirectory())
45             {
46                 indexDirectory(writer,f);
47             }
48             else if(f.getName().endsWith(".txt"))
49             {
50                 indexFile(writer,f);//遞歸
51                 numIndexed+=1;
52             }
53         }
54         return numIndexed;
55     }
56     //創建IndexWriter並開始文件系統遍歷
57     public static int index(File indexDir,File dataDir) throws IOException
58     {
59         if(!dataDir.exists()||!dataDir.isDirectory())
60         {
61             throw new IOException(dataDir+"does not exist or is not a directory!");
62         }
63         Directory directory = FSDirectory.open(Paths.get("index"));
64         IndexWriterConfig iwConfig = new IndexWriterConfig( new StandardAnalyzer());
65         IndexWriter writer = new IndexWriter(directory, iwConfig);
66         int numIndexed=indexDirectory(writer,dataDir);
67         writer.close();
68         return numIndexed;
69     }
70     public static void main(String[] args) throws Exception
71     {
72         File indexDir=new File("index");
73         File dataDir=new File("raw");
74         int numIndexed=index(indexDir,dataDir);
75         System.out.println("Indexing " + numIndexed + " files");
76     }
77     
78 
79 }

View Code

這個程序的源代碼可以到這里下載：https://github.com/shelly-github/my_simple_Lucenetest

下載及導入方法同上，同樣注意的是，我沒有上傳Index文件，需要先運行Indexer建立索引，然后再利用Searcher進行檢索~

現在，我們已經學會了遍歷一個文件系統來建立索引，是不是很簡單呢？

下一節，我們來深入了解一下Lucene的檢索原理~~

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 微信小程序實戰--開發一個簡單的快遞單號查詢一個簡單的留言微信小程序實戰丨如何制作一個完整的外賣小程序（已開源）小程序實戰小匯總 Python3的tkinter寫一個簡單的小程序 Python入門 —— 用pycharm寫一個簡單的小程序1 一個簡單抓取糗事百科糗事的小程序【先定一個小目標】Windows下安裝MongoDB 3.2 簡單JavaScript小程序簡單的日歷小程序