對於滿足全文檢索的需求來說,Lucene.Net無疑是一個很好的選擇。它引入了增量索引的策略,解決了在數據頻繁改動時重建索引的問題,這對於提高web的性能至關重要(其他相關特性大家可以參看官方文檔)。Lucene.Net是基於文檔性的全文搜索,所以使用Lucene.Net時要把數據庫中的數據先導出來,這也是一個建立索引的過程。代碼如下:
1 /// <summary> 2 /// Add Data into Indexes 3 /// </summary> 4 /// <param name="models">Data collection</param> 5 /// <param name="optimize">Whether to optimize the indexes after adding new indexes</param> 6 public void AddToSearchIndex(IEnumerable<T> models, bool optimize = false) 7 { 8 var analyzer = new StandardAnalyzer(Version.LUCENE_30); 9 using (var writer = new IndexWriter(_directory,analyzer,IndexWriter.MaxFieldLength.UNLIMITED)) 10 { 11 foreach (var model in models) 12 { 13 //remove older index entry 14 var searchQuery = new TermQuery(new Term("Id", (model as dynamic).ID.ToString())); 16 writer.DeleteDocuments(searchQuery); 17 18 var doc = new Document(); 19 foreach (var prop in Props) 20 { 21 var value = prop.GetValue(model); 22 if (value == null) 23 { 24 continue; 25 }
26 //only store ID,we use it to retrieve model data from DB 27 doc.Add(new Field(prop.Name, value.ToString(), 28 prop.Name == "ID" ? Field.Store.YES : Field.Store.NO, 29 Field.Index.ANALYZED)); 30 } 31 writer.AddDocument(doc); 32 } 33 if (optimize) 34 { 35 writer.Optimize(); 36 } 37 } 38 }
上述函數用於把到處的數據添加到索引文件中,我們可以指定是否在完成插入后優化索引。優化索引可以提高檢索速度,但會消耗Cpu資源,不建議經常優化它。另外,我們在插入索引時會先檢測時更新還是添加,這用於完成對舊數據的更新。那么,如果當數據庫移除了一條記錄,對於索引文件我們又該如何做呢?
和數據庫操作類似,當從數據庫移除記錄時,從所以文件中移除相應記錄即可,代碼如下:
/// <summary> /// Remove specfied index record /// </summary> /// <param name="record_id">the record's ID</param> public void ClearSearchIndex(int record_id) { var analyzer = new StandardAnalyzer(Version.LUCENE_30); using (var writer = new IndexWriter(_directory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED)) { // remove older index entry var searchQuery = new TermQuery(new Term("ID", record_id.ToString())); writer.DeleteDocuments(searchQuery); writer.Commit(); } analyzer.Dispose(); }
同樣,我們可以刪除所有的索引記錄
/// <summary> /// Remove all index records /// </summary> /// <returns>whether operation success or not</returns> public bool ClearAllSearchIndex() { StandardAnalyzer analyzer = null; try { analyzer = new StandardAnalyzer(Version.LUCENE_30); using (var writer = new IndexWriter(_directory, analyzer, true,
IndexWriter.MaxFieldLength.UNLIMITED)) { //remove older index entries writer.DeleteAll(); writer.Commit(); } analyzer.Dispose(); } catch (Exception) { analyzer.Dispose(); return false; } return true; }
下面該主角登場了,看看如何檢索記錄吧:
/// <summary> /// Searching specfied value in all fields,or you can specfied a field to search in. /// </summary> /// <param name="querystring">value to search</param> /// <param name="fieldname">field to search, search all fieds at default</param> /// <returns>realted records' ID sequence</returns> public IEnumerable<int> Search(string querystring, string fieldname = "") { IEnumerable<int> result = new List<int>(); if (string.IsNullOrEmpty(querystring)) { return new List<int>(); } //remove invalid characters querystring = ParseSearchString(querystring); // validation if (string.IsNullOrEmpty(querystring.Replace("*", "").Replace("?", ""))) { return new List<int>(); } using (var searcher = new IndexSearcher(_directory, true)) { ScoreDoc[] hits = null; //the max hited racord count var hits_limit = 1000; var analyzer = new StandardAnalyzer(Version.LUCENE_30); //used to separate the querystring to match records in indexes QueryParser parser = null; Query query = null; if (!string.IsNullOrEmpty(fieldname)) { //create a QueryParser instance in the specified field parser = new QueryParser(Version.LUCENE_30, fieldname, analyzer); } else { string[] fields = Props.Select(p => p.Name).ToArray<string>(); //create a QueryParser instance in the all fields parser = new MultiFieldQueryParser(Version.LUCENE_30, fields, analyzer); } //create a query instance from QueryParser and querystring query = ParseQuery(querystring, parser); //get the hited record hits = searcher.Search(query, hits_limit).ScoreDocs; var resultDocs = hits.Select(hit => searcher.Doc(hit.Doc)); //transmit the index record's ID to the DB record's ID result = resultDocs.
Select(doc => ((SpecEquipmentID)int.Parse(doc.Get("ID"))).CurrentID).
ToList(); analyzer.Dispose(); } return result; }
從上述可以看出,我們可以指定在若干字段間搜索,這些字段間的檢索同樣可采用模糊檢索的模式:
public IEnumerable<int> MultiFieldsSearch(Dictionary<string, string> multiFieldsDict) { IEnumerable<int> result = new List<int>(); if (multiFieldsDict.Count == 0) { return result; } using (var searcher = new IndexSearcher(_directory, true)) { ScoreDoc[] hits = null; var hits_limit = 1000; var analyzer = new StandardAnalyzer(Version.LUCENE_30); var occurs = (from field in multiFieldsDict.Keys select Occur.MUST).ToArray(); var queries = (from key in multiFieldsDict.Keys select multiFieldsDict[key]).ToArray(); Query query = MultiFieldQueryParser.Parse(Version.LUCENE_30, queries,
multiFieldsDict.Keys.ToArray(), occurs, analyzer); hits = searcher.Search(query, hits_limit).ScoreDocs; var resultDocs = hits.Select(hit => searcher.Doc(hit.Doc)); result = resultDocs.
Select(doc => ((SpecEquipmentID)int.Parse(doc.Get("ID"))).CurrentID).
Distinct().ToList(); analyzer.Dispose(); } return result; }
在這里解釋下:為什么用QueryParser生成Query的實例?
使用QueryParser可以讓我們在指定的字段間使用模糊查詢,也就是說,只要相應的記錄之中包含檢索值,都會被命中,這也正是全文搜索所必需的。如果不采用以上方式,可以使用BooleanQuery結合TermQuery在指定字段間搜索,但這樣以來,只有同值記錄(精確查詢)會被命中。這些搜索條件間同樣可以像數據庫查詢那樣采用‘與或非’的形式。
最后說明一下:對於數值類型和日期類型的處理比較特殊,如果采用像字符串那樣的處理方式,結果的精確性就會下降,至於如何處理針對數值類型和日期類型的數據檢索,大家可以參考Lucene的官方文檔。提及一下我的解決方案:我們可以采用常規數據庫與Lucene結合的方式,讓Lucene處理字符串類型的檢索,常規數據庫處理日期及數值類型的檢索,各抒其長。