業務要求是對商品標題可以進行模糊搜索
例如用戶輸入了【我想查詢下雅思托福考試】,這里我們需要先將這句話分詞成【查詢】【雅思】【托福】【考試】,然后搜索包含相關詞匯的商品。
思路如下
首先我們需要把數據庫里的所有商品內容,自動同步到 Lucene 的分詞索引目錄下緩存,效果如下
這里就用到了之前寫的自動作業 Hangfire 大家可以參考下面的博文
https://www.cnblogs.com/jhli/p/10027074.html
定時更新緩存,后面就可以分詞搜索了,更新索引代碼如下
public void UpdateMerchIndex() { try { Console.WriteLine($"[{DateTime.Now}] UpdateMerchIndex job begin..."); var indexDir = Path.Combine(System.IO.Directory.GetCurrentDirectory(), "temp", "lucene", "merchs"); if (System.IO.Directory.Exists(indexDir) == false) { System.IO.Directory.CreateDirectory(indexDir); } var VERSION = Lucene.Net.Util.LuceneVersion.LUCENE_48; var director = FSDirectory.Open(new DirectoryInfo(indexDir)); var analyzer = new JieBaAnalyzer(TokenizerMode.Search); var indexWriterConfig = new IndexWriterConfig(VERSION, analyzer); using (var indexWriter = new IndexWriter(director, indexWriterConfig)) { if (File.Exists(Path.Combine(indexDir, "segments.gen")) == true) { indexWriter.DeleteAll(); } var query = _merchService.Where(t => t.IsDel == false); var index = 1; var size = 200; var count = query.Count(); if (count > 0) { while (true) { var rs = query.OrderBy(t => t.CreateTime) .Skip((index - 1) * size) .Take(size).ToList(); if (rs.Count == 0) { break; } var addDocs = new List<Document>(); foreach (var item in rs) { var merchid = item.IdentityId.ToLowerString(); var doc = new Document(); var field1 = new StringField("merchid", merchid, Field.Store.YES); var field2 = new TextField("name", item.Name?.ToLower(), Field.Store.YES); doc.Add(field1); doc.Add(field2); addDocs.Add(doc);// 添加文本到索引中 } if (addDocs.Count > 0) { indexWriter.AddDocuments(addDocs); } index = index + 1; } } } Console.WriteLine($"[{DateTime.Now}] UpdateMerchIndex job end!"); } catch (Exception ex) { Console.WriteLine($"UpdateMerchIndex ex={ex}"); } }
剩下的就是去查詢索引內容,匹配到id,然后去數據庫查詢響應id的項。
搜索代碼
protected List<Guid> SearchMerchs(string key) { if (string.IsNullOrEmpty(key)) { return null; } key = key.Trim().ToLower(); var rs = new List<Guid>(); try { var indexDir = Path.Combine(System.IO.Directory.GetCurrentDirectory(), "temp", "lucene", "merchs"); var VERSION = Lucene.Net.Util.LuceneVersion.LUCENE_48; if (System.IO.Directory.Exists(indexDir) == true) { var reader = DirectoryReader.Open(FSDirectory.Open(new DirectoryInfo(indexDir))); var search = new IndexSearcher(reader); var directory = FSDirectory.Open(new DirectoryInfo(indexDir), NoLockFactory.GetNoLockFactory()); var reader2 = IndexReader.Open(directory); var searcher = new IndexSearcher(reader2); var parser = new QueryParser(VERSION, "name", new JieBaAnalyzer(TokenizerMode.Search)); var booleanQuery = new BooleanQuery(); var list = CutKeyWord(key); foreach (var word in list) { var query1 = new TermQuery(new Term("name", word)); booleanQuery.Add(query1, Occur.SHOULD); } var collector = TopScoreDocCollector.Create(1000, true); searcher.Search(booleanQuery, null, collector); var docs = collector.GetTopDocs(0, collector.TotalHits).ScoreDocs; foreach (var d in docs) { var num = d.Doc; var document = search.Doc(num);// 拿到指定的文檔 var merchid = document.Get("merchid"); var name = document.Get("name"); if (Guid.TryParse(merchid, out Guid mid) == true) { rs.Add(mid); } } } } catch (Exception ex) { Console.WriteLine($"SearchMerchs ex={ex}"); } return rs; }
對用戶輸入的話進行拆分分詞代碼 JiebaNet
protected List<string> CutKeyWord(string key) { var rs = new List<string>(); var segmenter = new JiebaSegmenter(); var list = segmenter.Cut(key); if (list != null && list.Count() > 0) { foreach (var item in list) { if (string.IsNullOrEmpty(item) || item.Length <= 1) { continue; } rs.Add(item); } } return rs; }
需要添加的 nuget 引用的包和對應版本
Hangfire 1.7.0-beta1
Lucene.Net 4.8.0-beta00005
Lucene.Net.Analysis.Common 4.8.0-beta00005
Lucene.Net.QueryParser 4.8.0-beta00005
需要單獨引用的dll文件
JiebaNet.Segmenter.dll
下載地址
https://pan.baidu.com/s/1D7mQnow0FmoqedNYzugfKw
如果本地調試沒有問題,發布到服務器上 自動執行作業就遇到這個問題
https://stackoverflow.com/questions/47746582/hangfire-job-throws-system-typeloadexception
System.TypeLoadException Could not load type ‘***’ from assembly ‘***, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null’.
其實這個報錯並不是原因,把異常打印出來就知道了
原因是沒有將 Resources 文件夾下的字典文件 dict.txt 發布到服務器上
這個坑讓我浪費了半天時間。。。