前言:相信許多人都聽說過.net開發過程中基於Lucene.net實現的全文索引,而Solr是一個高性能,基於Lucene的全文搜索服務器。同時對其進行了擴展,提供了比Lucene更為豐富的查詢語言,同時實現了可配置、可擴展並對查詢性能進行了優化,並且提供了一個完善的功能管理界面,是一款非常優秀的全文搜索引引擎,這里我就繞過Lucene,直接說Solr的應用了,總之,Solr比Lucene更加方便簡潔好用,而且上手快,開發效率高。
Solr應用場景:涉及到大數據的全文搜索。尤其是電子商務平台還有現在流行的雲計算,物聯網等都是需要強大的數據量作為支撐的,使用Solr來進行數據 檢索最合適不過了,而且Solr是免費開源的,門檻低、投資少見效快。關於Solr的一些優點我這里就不在累贅陳述了,園子里也有很多大神也寫了很多關於 Solr的技術博文,我這里也只是拋磚引玉,見笑了。
好了,這里就開始Solr的奇幻之旅吧
基於.NET平台下的Solr開發步驟
一、搭建Solr服務器,具體步驟如下:
1.安裝JDK,因為是.NET平台,不需要安裝JRE、JAVA虛擬機,只安裝JDK即可,而且安裝JDK不需要手動去配置環境變量,它會自動幫我們配置好環境變量,很方便,這里我安裝的是jdk1.7,官網地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html
2.安裝Tomcat8.0,官網地址:http://tomcat.apache.org/download-80.cgi,安裝完成后啟動Monitor Tomcat,瀏覽器地址欄輸入http://localhost:8080/,能進入說明安裝成功
3.下載Solr,這里我用的是Solr4.4版本,下載后進行下列配置
(1)解壓Solr4.4,創建Solr目錄,比如D:/SorlServer/one,將解壓后的Solr4.4中的example目錄下的Solr文件夾中的所有文件拷貝到創建的目錄中
(2)創建Solr Web應用,具體步驟,將解壓后的Solr4.4中的dist目錄下的Solr-4.4.0.war文件拷貝到Tomcat下,比如C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps下,重命名為one.war,啟動Tomcat后該文件會自動解壓,進入到D:\SorlServer\one\collection1\conf下,打開solrconfig.xml文件,找到 <dataDir>節點改為<dataDir>${solr.data.dir:c:/SorlServer/one/data}</dataDir>
注意:這一步很重要:打開C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\One\WEB-INF下的web.xml文件,找到<env-entry>節點開啟,
將env-entry-value值改為D:/SorlServer/one,如下:
<env-entry>
<env-entry-name>solr/home</env-entry-name>
<env-entry-value>D:/SorlServer/one</env-entry-value>
<env-entry-type>java.lang.String</env-entry-type>
</env-entry>
(3)將解壓后的Solr4.4下的/dist/solrj-lib目錄中的所有jar包拷貝到C:\Program Files\Apache Software Foundation\Tomcat 7.0\lib中
(4)停止Tomcat,然后再啟動,訪問http://localhost:8080/one,即可打開
注意:如果是開發英文網站,我們就不需要使用第三方的分詞配置,Solr本身就內置支持英文分詞,如果是其他語種比如小語種(日語、意大利、法語等等),大家可以去網上找相關的分詞包,這里我們以中文分詞為例,畢竟國內大部分網站都是中文為主的。
4.配置中文分詞,國內常用的分詞器(庖丁解牛、mmseg4j、IKAnalyzer),這里我用的是IKAnalyzer,這個分詞器比較活躍而且更新也快,挺好用的,具體步驟如下:
(1)將IKAnalyzer的jar包以及IKAnalyzer.cfg.xml都復制到C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\one\WEB-INF\lib下
(2)配置D:\SorlServer\one\collection1\conf下的schema.xml,添加如下配置:
<!-- 分詞配置 -->
<fieldType name="text_IKFENCHI" class="solr.TextField">
<analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>
</fieldType>
(3)停止Tomcat,然后再啟動,訪問http://localhost:8080/one/#/collection1/analysis,即可進行測試
以上是Solr服務器端的相關配置工作
二、開始基於.NET平台的Solr開發:
1.下載Solr客戶端組件,我用的是園子里的Terry大哥的EasyNet.Solr,地址在微軟開源站:http://easynet.codeplex.com/,
Terry大哥已經把solr客戶端封裝的很完善了,里面封裝了很多現成的方法和參數配置,我們直接可以拿過來用,利用Easynet.solr創建索引,然后再查詢索引,具體使用方法如下:
(1)下載EasyNet.Solr源碼直接放到項目中,也可以將源碼生成Dll組件后添加到項目引用進行使用,把源碼放到項目中最好不過了,我們也可以對其進行調整來滿足自己的需要
(2)創建索引實體類,就是我們要保存的索引數據,比如創建一個產品實體類
using System; using System.Collections.Generic; namespace Seek.SearchIndex { public partial class IndexProductModel { public IndexProductModel() { } #region Properties public int ID { get; set; } public int ProductID { get; set; } public string ClassPath { get; set; } public int ClassID1 { get; set; } public int ClassID2 { get; set; } public int ClassID3 { get; set; } public string Title { get; set; } public string Model { get; set; } public string PriceRange { get; set; } public string AttributeValues { get; set; } public string ProductImages { get; set; } public int MemberID { get; set; } public System.DateTime CreateDate { get; set; } public System.DateTime LastEditDate { get; set; } public string FileName { get; set; } public string ProductType { get; set; } public string Summary { get; set; } public string Details { get; set; } public string RelatedKeywords { get; set; } public int MemberGrade { get; set; } #endregion } }
(3)配置Solr服務器端的xml,就是將咱們的這個索引實體類配置到Solr服務器上,進入D:\SorlServer\one\collection1\conf,打開schema.xml文件,配置如下
<field name="ID" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="ProductID" type="int" indexed="true" stored="true"/> <!-- 快速高亮配置 termVectors="true" termPositions="true" termOffsets="true" --> <field name="Title" type="text_en_splitting" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true"/> <field name="Model" type="text_en_splitting" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true"/> <field name="ClassPath" type="string" indexed="true" stored="true"/> <field name="ClassID1" type="int" indexed="true" stored="true"/> <field name="ClassID2" type="int" indexed="true" stored="true"/> <field name="ClassID3" type="int" indexed="true" stored="true"/> <field name="PriceRange" type="string" indexed="true" stored="true"/> <field name="AttributeValues" type="string" indexed="true" stored="true"/> <field name="ProductImages" type="string" indexed="true" stored="true"/> <field name="MemberID" type="int" indexed="true" stored="true"/> <field name="CreateDate" type="date" indexed="true" stored="true"/> <field name="LastEditDate" type="date" indexed="true" stored="true"/> <field name="FileName" type="string" indexed="true" stored="true"/> <field name="ProductType" type="string" indexed="true" stored="true"/> <field name="Summary" type="string" indexed="true" stored="false"/> <field name="Details" type="string" indexed="true" stored="false"/> <field name="RelatedKeywords" type="string" indexed="true" stored="true"/> <field name="MemberType" type="string" indexed="true" stored="true"/> <field name="MemberGrade" type="int" indexed="true" stored="true"/>
(4)開始創建索引,最好能寫一個生成索引的客戶端程序,我這里提供一下自己的索引器關鍵代碼
using System; using System.Collections.Generic; using System.Linq; using System.Text; using Seek.SearchIndex; using System.Data; using System.Threading; using System.Configuration; using System.Reflection; using EasyNet.Solr; using EasyNet.Solr.Impl; using EasyNet.Solr.Commons; using System.Xml.Linq; using EasyNet.Solr.Commons.Params; using System.Threading.Tasks; namespace Seek.SearchIndex { /// <summary> /// 索引器 /// </summary> public class Indexer { private readonly static OptimizeOptions optimizeOptions = new OptimizeOptions(); private readonly static CommitOptions commitOptions = new CommitOptions() { SoftCommit = true }; private readonly static ISolrResponseParser<NamedList, EasyNet.Solr.ResponseHeader> binaryResponseHeaderParser = new BinaryResponseHeaderParser(); private readonly static IUpdateParametersConvert<NamedList> updateParametersConvert = new BinaryUpdateParametersConvert(); private readonly static ISolrQueryConnection<NamedList> connection = new SolrQueryConnection<NamedList>() { ServerUrl = ConfigurationManager.AppSettings["SolrServer"] }; private readonly static ISolrUpdateConnection<NamedList, NamedList> solrUpdateConnection = new SolrUpdateConnection<NamedList, NamedList>() { ServerUrl = ConfigurationManager.AppSettings["SolrServer"], ContentType = "application/javabin" }; private readonly static ISolrUpdateOperations<NamedList> solr = new SolrUpdateOperations<NamedList, NamedList>(solrUpdateConnection, updateParametersConvert) { ResponseWriter = "javabin" }; private readonly static ISolrQueryOperations<NamedList> solrQuery = new SolrQueryOperations<NamedList>(connection) { ResponseWriter = "javabin" }; public enum State { /// <summary> /// 運行中 /// </summary> Runing, /// <summary> /// 停止 /// </summary> Stop, /// <summary> /// 中斷 /// </summary> Break } /// <summary> /// 窗口 /// </summary> private Main form; /// <summary> /// 線程 /// </summary> public Thread t; /// <summary> /// 消息狀態 /// </summary> public State state = State.Stop; /// <summary> /// 當前索引 /// </summary> private long currentIndex = 0; public long CurrentIndex { get { return currentIndex; } set { currentIndex = value; } } private int _startId = AppCongfig.StartId; public int StartId { get { return _startId; } set { _startId = value; } } /// <summary> /// 產品總數 /// </summary> private int productsCount = 0; /// <summary> /// 起始時間 /// </summary> private DateTime startTime = DateTime.Now; /// <summary> /// 結束時間 /// </summary> private DateTime endTime = DateTime.MinValue; private static object syncLock = new object(); #region 單利模式 private static Indexer instance = null; private Indexer(Main _form) { form = _form; productsCount = DataAccess.GetCount(0); //產品數統計 form.fullerTsslMaxNum.Text = productsCount.ToString(); form.fullerProgressBar.Minimum = 0; form.fullerProgressBar.Maximum = productsCount; } public static Indexer GetInstance(Main form) { if (instance == null) { lock (syncLock) { if (instance == null) { instance = new Indexer(form); } } } return instance; } #endregion /// <summary> /// 啟動 /// </summary> public void Start() { ThreadStart ts = new ThreadStart(FullerRun); t = new Thread(ts); t.Start(); } /// <summary> /// 停止 /// </summary> public void Stop() { state = State.Stop; } /// <summary> /// 中斷 /// </summary> public void Break() { state = State.Break; } /// <summary> /// 創建索引 /// </summary> public void InitIndex(object data) { var docs = new List<SolrInputDocument>(); DataTable list = data as DataTable; foreach (DataRow pro in list.Rows) { var model = new SolrInputDocument(); PropertyInfo[] properites = typeof(IndexProductModel).GetProperties();//得到實體類屬性的集合 string[] dateFields = { "CreateDate", "LastEditDate" }; string field = string.Empty;//存儲fieldname foreach (PropertyInfo propertyInfo in properites)//遍歷數組 { object val = pro[propertyInfo.Name]; if (val != DBNull.Value) { model.Add(propertyInfo.Name, new SolrInputField(propertyInfo.Name, val)); } } docs.Add(model); StartId = Convert.ToInt32(pro["ID"]); } GetStartId(); lock (syncLock) { if (currentIndex <= productsCount) { form.fullerProgressBar.Value = (int)currentIndex; } form.fullerTsslCurrentNum.Text = currentIndex.ToString(); } var result = solr.Update("/update", new UpdateOptions() { Docs = docs }); } /// <summary> /// 創建索引 /// </summary> public void CreateIndexer(DataTable dt) { GetStartId(); Parallel.ForEach<DataRow>(dt.AsEnumerable(), (row) => { //從數據庫查詢商品詳細屬性 if (row != null) { var docs = new List<SolrInputDocument>(); var model = new SolrInputDocument(); PropertyInfo[] properites = typeof(IndexProductModel).GetProperties();//得到實體類屬性的集合 string[] dateFields = { "CreateDate", "LastEditDate" }; string field = string.Empty;//存儲fieldname foreach (PropertyInfo propertyInfo in properites)//遍歷數組 { object val = row[propertyInfo.Name]; if (val != DBNull.Value) { model.Add(propertyInfo.Name, new SolrInputField(propertyInfo.Name, val)); } } docs.Add(model); StartId = Convert.ToInt32(row["ID"]); var result = solr.Update("/update", new UpdateOptions() { Docs = docs }); } }); //GetStartId(); lock (syncLock) { if (currentIndex <= productsCount) { form.fullerProgressBar.Value = (int)currentIndex; } form.fullerTsslCurrentNum.Text = currentIndex.ToString(); } } /// <summary> /// 全部索引運行 /// </summary> public void FullerRun() { //GetStartId(); //form.fullerTsslCurrentNum.Text = currentIndex.ToString(); DataTable dt = DataAccess.GetNextProductsInfo(StartId); StartId = AppCongfig.StartId; if (state == State.Break) { this.SendMesasge("完全索引已繼續,起始ID[" + StartId + "]..."); } else { startTime = DateTime.Now; this.SendMesasge("完全索引已啟動,起始ID[" + StartId + "]..."); } state = State.Runing; form.btnInitIndex.Enabled = false; form.btnSuspend.Enabled = true; form.btnStop.Enabled = true; while (dt != null && dt.Rows.Count > 0 && state == State.Runing) { try { InitIndex(dt);//單線程 // CreateIndexer(dt);//多線程 } catch (Exception ex) { state = State.Stop; form.btnInitIndex.Enabled = true; form.btnSuspend.Enabled = false; form.btnStop.Enabled = false; GetStartId(); this.SendMesasge(ex.Message.ToString()); } form.fullerTsslTimeSpan.Text = "已運行 :" + GetTimeSpanShow(DateTime.Now - startTime) + ",預計還需:" + GetTimeSpanForecast(); try { dt = DataAccess.GetNextProductsInfo(StartId);//獲取下一組產品 } catch (Exception err) { this.SendMesasge("獲取下一組產品出錯,起始ID[" + StartId + "]:" + err.Message); } } if (state == State.Runing) { state = State.Stop; form.btnInitIndex.Enabled = true; form.btnSuspend.Enabled = false; form.btnStop.Enabled = false; AppCongfig.SetValue("StartId", StartId.ToString()); this.SendMesasge("完全索引已完成,總計索引數[" + currentIndex + "]結束的產品Id" + StartId); } else if (state == State.Break) { GetStartId(); state = State.Break; form.btnInitIndex.Enabled = true; form.btnSuspend.Enabled = false; form.btnStop.Enabled = false; AppCongfig.SetValue("StartId", StartId.ToString()); this.SendMesasge("完全索引已暫停,當前索引位置[" + currentIndex + "]結束的產品Id" + StartId); } else if (state == State.Stop) { GetStartId(); state = State.Stop; this.SendMesasge("完全索引已停止,已索引數[" + currentIndex + "]結束的產品Id" + StartId); form.btnInitIndex.Enabled = true; form.btnSuspend.Enabled = false; form.btnStop.Enabled = false; AppCongfig.SetValue("StartId", StartId.ToString()); productsCount = DataAccess.GetCount(StartId); //產品數統計 form.fullerTsslMaxNum.Text = productsCount.ToString(); form.fullerProgressBar.Minimum = 0; form.fullerProgressBar.Maximum = productsCount; } endTime = DateTime.Now; } /// <summary> /// 多線程構建索引數據方法 /// </summary> /// <param name="threadDataParam"></param> public void MultiThreadCreateIndex(object threadDataParam) { InitIndex(threadDataParam); } /// <summary> /// 獲取最大的索引id /// </summary> private void GetStartId() { IDictionary<string, ICollection<string>> options = new Dictionary<string, ICollection<string>>(); options[CommonParams.SORT] = new string[] { "ProductID DESC" }; options[CommonParams.START] = new string[] { "0" }; options[CommonParams.ROWS] = new string[] { "1" }; options[HighlightParams.FIELDS] = new string[] { "ProductID" }; options[CommonParams.Q] = new string[] { "*:*" }; var result = solrQuery.Query("/select", null, options); var solrDocumentList = (SolrDocumentList)result.Get("response"); currentIndex = solrDocumentList.NumFound; if (solrDocumentList != null && solrDocumentList.Count() > 0) { StartId = (int)solrDocumentList[0]["ProductID"]; //AppCongfig.SetValue("StartId", solrDocumentList[0]["ProductID"].ToString()); } else { StartId = 0; // AppCongfig.SetValue("StartId", "0"); } } /// <summary> /// 優化索引 /// </summary> public void Optimize() { this.SendMesasge("開始優化索引,請耐心等待..."); var result = solr.Update("/update", new UpdateOptions() { OptimizeOptions = optimizeOptions }); var header = binaryResponseHeaderParser.Parse(result); this.SendMesasge("優化索引耗時:" + header.QTime + "毫秒"); } /// <summary> /// 發送消息到界面 /// </summary> /// <param name="message">發送消息到界面</param> protected void SendMesasge(string message) { form.fullerDgvMessage.Rows.Add(form.fullerDgvMessage.Rows.Count + 1, message, DateTime.Now.ToString()); } /// <summary> /// 獲取時間間隔顯示 /// </summary> /// <param name="ts">時間間隔</param> /// <returns></returns> protected string GetTimeSpanShow(TimeSpan ts) { string text = ""; if (ts.Days > 0) { text += ts.Days + "天"; } if (ts.Hours > 0) { text += ts.Hours + "時"; } if (ts.Minutes > 0) { text += ts.Minutes + "分"; } if (ts.Seconds > 0) { text += ts.Seconds + "秒"; } return text; } /// <summary> /// 獲取預測時間 /// </summary> /// <returns></returns> protected string GetTimeSpanForecast() { if (currentIndex != 0) { TimeSpan tsed = DateTime.Now - startTime; double d = ((tsed.TotalMilliseconds / currentIndex) * productsCount) - tsed.TotalMilliseconds; return GetTimeSpanShow(TimeSpan.FromMilliseconds(d)); } return ""; } } }
(5)運行索引器,創建索引,這里是我的索引器界面,如圖
可以隨時跟蹤索引生成的情況
(6)索引創建完畢后,可以進入Solr服務器界面http://localhost:8080/one/#/collection1/query進行測試
以上就是Solr的前期工作,主要是Solr服務器搭建和客戶端調用生成索引,后期再對客戶端的查詢進行詳細的說明,下期預告
1.全文搜索,分詞配置,以及類似於谷歌和百度那種輸入關鍵字自動完成功能
2.Facet查詢