Lucene索引核心類研究

本文轉載自查看原文 2013-02-01 14:17 3043 Lucene

簡單的建立索引和查詢索引並不難，關鍵在於他的二次開發，讓他適合你自己的需求

既然要二次開發就必須查看源碼

首先看看索引過程中的核心類吧：

IndexWriter

這個是核心組件，建立和打開索引，以及向文檔中添加、刪除或更新被索引文檔的信息。

Directory

描述了Lucene索引的存放位置，他是一個抽象類，一般都用FSDirectory.open（）,

Analyzer

IndexWriter 必須指定一個分詞器（分析器），

Document

代表了一些域的集合，他表示了每個所要保存的單個文本

Field (4.0 以后就不是Field 了， LongField， TextField ，StringField ，pathField )

Field pathField = new StringField("path", file.getPath(), Field.Store.YES);

doc.add(pathField);

doc.add(new LongField("modified", file.lastModified(), Field.Store.NO));

doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, "UTF-8"))));

建立索引的例子

（注意使用Filed 的時候，StringField是全部匹配的，如下面的“我的Lucene學習” 如果你想查出來，Term必須是“我的Lucene學習” ，如果你想根據“我” 或者“Lucene” 查出結果，必須將StrinField 該為TextField,

如果你想自己的Filed不是完全匹配的話，建議使用TextFiled）：

    public void creatIndex() {
        // 這是索引存放的位置
        try {
            String indexPath = "index";
            Directory dir;
            dir = FSDirectory.open(new File(indexPath));
            Analyzer analyzer = new MyAnalyzer(Version.LUCENE_41);
            IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_41,
                    analyzer);
            iwc.setOpenMode(OpenMode.CREATE);
            IndexWriter writer = new IndexWriter(dir, iwc);
            Document doc = new Document();
            doc.add(new StringField ("id" ,"1" ,Store.YES));

            doc.add(new StringField("title", "我的Lucene學習",Store.YES));
            doc.add(new StringField("content", "Lucene是一個不錯的搜索工具，我很喜歡",Store.YES));
            writer.addDocument(doc);
            writer.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

在導入4.0的源碼的時候如果你只導入了lucene-4.1.0-src\lucene-4.1.0\core\src\java 這個文件下的源碼建立索引的話，會出現一個異常:

Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.lucene.index.LiveIndexWriterConfig.<init>(LiveIndexWriterConfig.java:118)
at org.apache.lucene.index.IndexWriterConfig.<init>(IndexWriterConfig.java:145)
at com.test.TestIndex.creatIndex(TestIndex.java:33)
at com.test.TestIndex.main(TestIndex.java:22)
Caused by: java.lang.IllegalArgumentException: A SPI class of type org.apache.lucene.codecs.Codec with name 'Lucene41' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath.The current classpath supports the following names: []
at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:106)
at org.apache.lucene.codecs.Codec.forName(Codec.java:95)
at org.apache.lucene.codecs.Codec.<clinit>(Codec.java:122)
... 4 more

一開始我以為自己復制錯了，查看了下源碼，有這個類，於是我看了下源碼，才放現原因在這：

package org.apache.lucene.util;

public final class SPIClassIterator<S> implements Iterator<Class<? extends S>> {
private static final String META_INF_SERVICES = "META-INF/services/";

只要把lucene-core-4.1.0.jar包里的META-INF/services 文件夾考到工程里即可

添加后執行TestIndex， index下面就有了索引，和以前的有點區別吧

既然源碼都跑通了，就開始研究它內部的代碼吧。

既然是寫索引，就從org.apache.lucene.index 這個文件夾下研究唄。

（1） IndexWriter

構造方法：

public IndexWriter(Directory d, IndexWriterConfig conf) throws IOException

傳遞的參數是索引的目錄和 IndexWriter配置（配置包括了Lucene 的版本和分詞器）

添加Document的方法

public void addDocument(Iterable<? extends IndexableField> doc)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 lucene 索引創建步驟 lucene 索引合並策略 Lucene -- 實時索引 Lucene索引，查詢及高亮顯示【Lucene】Lucene 學習之索引文件結構微服務核心研究之--編排【Java】Lucene檢索引擎詳解（轉）Lucene倒排索引工作原理 Lucene教程(四) 索引的更新和刪除使用Lucene索引和檢索POI數據