使用Lucene.net進行全文查找多關鍵字匹配

本文轉載自查看原文 2012-12-14 16:44 6852 Lucene.net

Lucene是一個開源的搜索引擎，開發語言是Java，Lucene.net是它的.NET版本。可以在C#中方便的調用。

Lucene.net目前最新版本是3.0.3，你可以從官方網站下載到最新版本：http://lucenenet.apache.org/

使用Lucene.net進行全文查找首先要根據數據創建索引，然后再根據索引來查找關鍵字。本文不做任何原理性的解釋，需要深入研究的請自行Google之。

創建索引

還是直接上代碼的比較好：

            IndexWriter writer = new IndexWriter(FSDirectory.Open(new DirectoryInfo(indexDirectory)),
                    analyzer, true, IndexWriter.MaxFieldLength.LIMITED);

            for (int i = 0; i < files.Count(); i++)
            {
                FileInfo fileInfo = files[i];
                StreamReader reader = new StreamReader(fileInfo.FullName);

                OutputMessage("正在索引文件[" + fileInfo.Name + "]");

                Document doc = new Document();
                doc.Add(new Field("FileName", fileInfo.Name, Field.Store.YES, Field.Index.ANALYZED));
                doc.Add(new Field("Author", reader.ReadLine(), Field.Store.YES, Field.Index.ANALYZED));
                doc.Add(new Field("Content", reader.ReadToEnd(), Field.Store.NO, Field.Index.ANALYZED));
                doc.Add(new Field("Path", fileInfo.FullName, Field.Store.YES, Field.Index.NO));

                writer.AddDocument(doc);
                writer.Optimize();
            }

            writer.Dispose();

在上面的代碼中，我們對文本文件進行了索引，每一個文件都保存了FileName（文件名）、Author（作者）、Content（內容）、Path（文件路徑）幾個字段。跟數據庫中的字段很相似。

使用中文分詞

目前中文分詞有Lucene.Net.Analysis.Cn.ChineseAnalyzer 和盤古分詞，我們在測試中使用前者，更方便我們做測試。

Lucene.Net.Analysis.Cn.ChineseAnalyzer包含在源代碼中，你可以把它復制到你的源代碼中，也可以編譯以后引用。我這里為了方便，直接復制到源代碼中的。建議在實際使用的時候，編譯然后添加引用。

全文查找

還是上代碼吧，看上去更直接。

            IndexReader reader = null;
            IndexSearcher searcher = null;
            try
            {
                reader = IndexReader.Open(FSDirectory.Open(new DirectoryInfo(indexDirectory)), true);
                searcher = new IndexSearcher(reader);
                //創建查詢
                PerFieldAnalyzerWrapper wrapper = new PerFieldAnalyzerWrapper(analyzer);
                wrapper.AddAnalyzer("FileName", analyzer);
                wrapper.AddAnalyzer("Author", analyzer);
                wrapper.AddAnalyzer("Content", analyzer);
                string[] fields = {"FileName", "Author", "Content"};

                QueryParser parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, fields, wrapper);
                Query query = parser.Parse(keyword);
                TopScoreDocCollector collector = TopScoreDocCollector.Create(num, true);

                searcher.Search(query, collector);
                var hits = collector.TopDocs().ScoreDocs;

                int numTotalHits = collector.TotalHits;
                OutputMessage("查找 " + keyword + " ...共找到 " + numTotalHits + "個匹配的文檔");

                //以后就可以對獲取到的collector數據進行操作
                for (int i = 0; i < hits.Count(); i++)
                {
                    var hit = hits[i];
                    Document doc = searcher.Doc(hit.Doc);
                    Field fileNameField = doc.GetField("FileName");
                    Field authorField = doc.GetField("Author");
                    Field pathField = doc.GetField("Path");

                    OutputMessage(fileNameField.StringValue + "[" + authorField.StringValue + "]，匹配指數：" + Math.Round(hit.Score * 100, 2) + "%");
                    OutputMessage(pathField.StringValue);
                    OutputMessage(string.Empty);
                }
            }
            finally
            {
                if (searcher != null)
                    searcher.Dispose();

                if (reader != null)
                    reader.Dispose();
            }

上面的這段代碼可以是實現多關鍵字的查找。需要說明的是，貌似在得到檢索到的文檔的地方，跟之前的版本有一些不同（我拷貝別人的代碼沒辦法直接運行）……

另外，此處檢索的行數受到num的影響，但返回的總行數是不受影響的，如果需要得到總行數，可以在第一次檢索以后得到總行數，然后將總行數賦值給num，在進行新一次的檢索。

按照慣例，附上源代碼

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 使用Lucene.Net實現全文檢索使用Lucene.Net實現全文檢索使用Lucene.Net實現全文檢索 python正則表達式同時匹配多個關鍵字（多關鍵字匹配） LINUX使用關鍵字進行日志查找搜索多關鍵字，匹配內容並將關鍵字高亮顯示 Lucene實戰之關鍵字匹配多個字段 lucene(全文搜索)_建立索引_根據關鍵字全文搜索_源碼下載多關鍵字排序利用 lucene.net 實現高效率的 WildcardQuery ，記一次類似百度搜索下拉關鍵字聯想功能的實現。