.NET開發過程中的全文索引使用技巧之Solr(轉)


   前言:相信許多人都聽說過.net開發過程中基於Lucene.net實現的全文索引,而Solr是一個高性能,基於Lucene的全文搜索服務器。同時對其進行了擴展,提供了比Lucene更為豐富的查詢語言,同時實現了可配置、可擴展並對查詢性能進行了優化,並且提供了一個完善的功能管理界面,是一款非常優秀的全文搜索引引擎,這里我就繞過Lucene,直接說Solr的應用了,總之,Solr比Lucene更加方便簡潔好用,而且上手快,開發效率高。

   Solr應用場景:涉及到大數據的全文搜索。尤其是電子商務平台還有現在流行的雲計算,物聯網等都是需要強大的數據量作為支撐的,使用Solr來進行數據 檢索最合適不過了,而且Solr是免費開源的,門檻低、投資少見效快。關於Solr的一些優點我這里就不在累贅陳述了,園子里也有很多大神也寫了很多關於 Solr的技術博文,我這里也只是拋磚引玉,見笑了。

   好了,這里就開始Solr的奇幻之旅吧

 

基於.NET平台下的Solr開發步驟

一、搭建Solr服務器,具體步驟如下:

   1.安裝JDK,因為是.NET平台,不需要安裝JRE、JAVA虛擬機,只安裝JDK即可,而且安裝JDK不需要手動去配置環境變量,它會自動幫我們配置好環境變量,很方便,這里我安裝的是jdk1.7,官網地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html

   2.安裝Tomcat8.0,官網地址:http://tomcat.apache.org/download-80.cgi,安裝完成后啟動Monitor Tomcat,瀏覽器地址欄輸入http://localhost:8080/,能進入說明安裝成功

   3.下載Solr,這里我用的是Solr4.4版本,下載后進行下列配置

  (1)解壓Solr4.4,創建Solr目錄,比如D:/SorlServer/one,將解壓后的Solr4.4中的example目錄下的Solr文件夾中的所有文件拷貝到創建的目錄中

  (2)創建Solr Web應用,具體步驟,將解壓后的Solr4.4中的dist目錄下的Solr-4.4.0.war文件拷貝到Tomcat下,比如C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps下,重命名為one.war,啟動Tomcat后該文件會自動解壓,進入到D:\SorlServer\one\collection1\conf下,打開solrconfig.xml文件,找到 <dataDir>節點改為<dataDir>${solr.data.dir:c:/SorlServer/one/data}</dataDir>

注意:這一步很重要:打開C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\One\WEB-INF下的web.xml文件,找到<env-entry>節點開啟,

將env-entry-value值改為D:/SorlServer/one,如下:

<env-entry>       

      <env-entry-name>solr/home</env-entry-name>

      <env-entry-value>D:/SorlServer/one</env-entry-value>

      <env-entry-type>java.lang.String</env-entry-type>

 </env-entry>

   (3)將解壓后的Solr4.4下的/dist/solrj-lib目錄中的所有jar包拷貝到C:\Program Files\Apache Software Foundation\Tomcat 7.0\lib中

  (4)停止Tomcat,然后再啟動,訪問http://localhost:8080/one,即可打開

注意:如果是開發英文網站,我們就不需要使用第三方的分詞配置,Solr本身就內置支持英文分詞,如果是其他語種比如小語種(日語、意大利、法語等等),大家可以去網上找相關的分詞包,這里我們以中文分詞為例,畢竟國內大部分網站都是中文為主的。

   4.配置中文分詞,國內常用的分詞器(庖丁解牛mmseg4jIKAnalyzer),這里我用的是IKAnalyzer,這個分詞器比較活躍而且更新也快,挺好用的,具體步驟如下:

   (1)將IKAnalyzer的jar包以及IKAnalyzer.cfg.xml都復制到C:\Program Files\Apache Software Foundation\Tomcat 7.0\webapps\one\WEB-INF\lib下

   (2)配置D:\SorlServer\one\collection1\conf下的schema.xml,添加如下配置:

      <!-- 分詞配置 -->

 <fieldType name="text_IKFENCHI" class="solr.TextField"> 

     <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>

 </fieldType>

    (3)停止Tomcat,然后再啟動,訪問http://localhost:8080/one/#/collection1/analysis,即可進行測試

    以上是Solr服務器端的相關配置工作

二、開始基於.NET平台的Solr開發:

   1.下載Solr客戶端組件,我用的是園子里的Terry大哥的EasyNet.Solr,地址在微軟開源站:http://easynet.codeplex.com/

Terry大哥已經把solr客戶端封裝的很完善了,里面封裝了很多現成的方法和參數配置,我們直接可以拿過來用,利用Easynet.solr創建索引,然后再查詢索引,具體使用方法如下:

  (1)下載EasyNet.Solr源碼直接放到項目中,也可以將源碼生成Dll組件后添加到項目引用進行使用,把源碼放到項目中最好不過了,我們也可以對其進行調整來滿足自己的需要

  (2)創建索引實體類,就是我們要保存的索引數據,比如創建一個產品實體類 

  

using System;
using System.Collections.Generic;

namespace Seek.SearchIndex
{
    public partial class IndexProductModel
    {
        public IndexProductModel()
        {
        }

        #region  Properties
        public int ID { get; set; }
        public int ProductID { get; set; }
        public string ClassPath { get; set; }
        public int ClassID1 { get; set; }
        public int ClassID2 { get; set; }
        public int ClassID3 { get; set; }
        public string Title { get; set; }
        public string Model { get; set; }
        public string PriceRange { get; set; }
        public string AttributeValues { get; set; }
        public string ProductImages { get; set; }
        public int MemberID { get; set; }
        public System.DateTime CreateDate { get; set; }
        public System.DateTime LastEditDate { get; set; }
        public string FileName { get; set; }
        public string ProductType { get; set; }
        public string Summary { get; set; }
        public string Details { get; set; }
        public string RelatedKeywords { get; set; }
        public int MemberGrade { get; set; }
        #endregion
    }
}

     (3)配置Solr服務器端的xml,就是將咱們的這個索引實體類配置到Solr服務器上,進入D:\SorlServer\one\collection1\conf,打開schema.xml文件,配置如下

   

<field name="ID" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
   <field name="ProductID" type="int" indexed="true" stored="true"/>
   <!-- 快速高亮配置 termVectors="true" termPositions="true"  termOffsets="true" -->
   <field name="Title" type="text_en_splitting" indexed="true" stored="true" termVectors="true" termPositions="true"  termOffsets="true"/>
   <field name="Model" type="text_en_splitting" indexed="true" stored="true" termVectors="true" termPositions="true"  termOffsets="true"/>
   <field name="ClassPath" type="string" indexed="true" stored="true"/>
   <field name="ClassID1" type="int" indexed="true" stored="true"/>
   <field name="ClassID2" type="int" indexed="true" stored="true"/>
   <field name="ClassID3" type="int" indexed="true" stored="true"/>
   <field name="PriceRange" type="string" indexed="true" stored="true"/>
   <field name="AttributeValues" type="string" indexed="true" stored="true"/>
   <field name="ProductImages" type="string" indexed="true" stored="true"/>
   <field name="MemberID" type="int" indexed="true" stored="true"/>
   <field name="CreateDate" type="date" indexed="true" stored="true"/>
   <field name="LastEditDate" type="date" indexed="true" stored="true"/>
   <field name="FileName" type="string" indexed="true" stored="true"/>
   <field name="ProductType" type="string" indexed="true" stored="true"/>
   <field name="Summary" type="string" indexed="true" stored="false"/>
   <field name="Details" type="string" indexed="true" stored="false"/>
   <field name="RelatedKeywords" type="string" indexed="true" stored="true"/>
   <field name="MemberType" type="string" indexed="true" stored="true"/>
   <field name="MemberGrade" type="int" indexed="true" stored="true"/>

 

 

 

    (4)開始創建索引,最好能寫一個生成索引的客戶端程序,我這里提供一下自己的索引器關鍵代碼

   

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Seek.SearchIndex;
using System.Data;
using System.Threading;
using System.Configuration;
using System.Reflection;
using EasyNet.Solr;
using EasyNet.Solr.Impl;
using EasyNet.Solr.Commons;
using System.Xml.Linq;
using EasyNet.Solr.Commons.Params;
using System.Threading.Tasks;

namespace Seek.SearchIndex
{
    /// <summary>
    /// 索引器
    /// </summary>
    public class Indexer
    {
        private readonly static OptimizeOptions optimizeOptions = new OptimizeOptions();
        private readonly static CommitOptions commitOptions = new CommitOptions() { SoftCommit = true };
        private readonly static ISolrResponseParser<NamedList, EasyNet.Solr.ResponseHeader> binaryResponseHeaderParser = new BinaryResponseHeaderParser();
        private readonly static IUpdateParametersConvert<NamedList> updateParametersConvert = new BinaryUpdateParametersConvert();
        private readonly static ISolrQueryConnection<NamedList> connection = new SolrQueryConnection<NamedList>() { ServerUrl = ConfigurationManager.AppSettings["SolrServer"] };
        private readonly static ISolrUpdateConnection<NamedList, NamedList> solrUpdateConnection = new SolrUpdateConnection<NamedList, NamedList>() { ServerUrl = ConfigurationManager.AppSettings["SolrServer"], ContentType = "application/javabin" };
        private readonly static ISolrUpdateOperations<NamedList> solr = new SolrUpdateOperations<NamedList, NamedList>(solrUpdateConnection, updateParametersConvert) { ResponseWriter = "javabin" };
        private readonly static ISolrQueryOperations<NamedList> solrQuery = new SolrQueryOperations<NamedList>(connection) { ResponseWriter = "javabin" };

        public enum State
        {
            /// <summary>
            /// 運行中
            /// </summary>
            Runing,
            /// <summary>
            /// 停止
            /// </summary>
            Stop,
            /// <summary>
            /// 中斷
            /// </summary>
            Break
        }
        /// <summary>
        /// 窗口
        /// </summary>
        private Main form;
        /// <summary>
        /// 線程
        /// </summary>
        public Thread t;
        /// <summary>
        /// 消息狀態
        /// </summary>
        public State state = State.Stop;
        /// <summary>
        /// 當前索引
        /// </summary>
        private long currentIndex = 0;

        public long CurrentIndex
        {
            get { return currentIndex; }
            set { currentIndex = value; }
        }

        private int _startId = AppCongfig.StartId;

        public int StartId
        {
            get { return _startId; }
            set { _startId = value; }
        }

        /// <summary>
        /// 產品總數
        /// </summary>
        private int productsCount = 0;
        /// <summary>
        /// 起始時間
        /// </summary>
        private DateTime startTime = DateTime.Now;
        /// <summary>
        /// 結束時間
        /// </summary>
        private DateTime endTime = DateTime.MinValue;
        private static object syncLock = new object();
        #region 單利模式
        private static Indexer instance = null;

        private Indexer(Main _form)
        {
            form = _form;
            productsCount = DataAccess.GetCount(0);       //產品數統計
            form.fullerTsslMaxNum.Text = productsCount.ToString();
            form.fullerProgressBar.Minimum = 0;
            form.fullerProgressBar.Maximum = productsCount;
        }
        public static Indexer GetInstance(Main form)
        {
            if (instance == null)
            {
                lock (syncLock)
                {
                    if (instance == null)
                    {
                        instance = new Indexer(form);
                    }
                }
            }
            return instance;
        }
        #endregion

        /// <summary>
        /// 啟動
        /// </summary>
        public void Start()
        {
            ThreadStart ts = new ThreadStart(FullerRun);
            t = new Thread(ts);
            t.Start();
        }
        /// <summary>
        /// 停止
        /// </summary>
        public void Stop()
        {
            state = State.Stop;
        }
        /// <summary>
        /// 中斷
        /// </summary>
        public void Break()
        {
            state = State.Break;
        }


        /// <summary>
        /// 創建索引
        /// </summary>
        public void InitIndex(object data)
        {
            var docs = new List<SolrInputDocument>();
            DataTable list = data as DataTable;
            foreach (DataRow pro in list.Rows)
            {
                var model = new SolrInputDocument();

                PropertyInfo[] properites = typeof(IndexProductModel).GetProperties();//得到實體類屬性的集合
                string[] dateFields = { "CreateDate", "LastEditDate" };
                string field = string.Empty;//存儲fieldname
                foreach (PropertyInfo propertyInfo in properites)//遍歷數組
                {
                    object val = pro[propertyInfo.Name];
                    if (val != DBNull.Value)
                    {
                        model.Add(propertyInfo.Name, new SolrInputField(propertyInfo.Name, val));
                    }
                }
                docs.Add(model);

                StartId = Convert.ToInt32(pro["ID"]);
            }
            GetStartId();
            lock (syncLock)
            {
                if (currentIndex <= productsCount)
                {
                    form.fullerProgressBar.Value = (int)currentIndex;
                }
                form.fullerTsslCurrentNum.Text = currentIndex.ToString();
            }
            var result = solr.Update("/update", new UpdateOptions() {  Docs = docs });
        }

        /// <summary>
        /// 創建索引
        /// </summary>
        public void CreateIndexer(DataTable dt)
        {
            GetStartId();
            Parallel.ForEach<DataRow>(dt.AsEnumerable(), (row) =>
            {
                //從數據庫查詢商品詳細屬性
                if (row != null)
                {
                    var docs = new List<SolrInputDocument>();
                    var model = new SolrInputDocument();

                    PropertyInfo[] properites = typeof(IndexProductModel).GetProperties();//得到實體類屬性的集合
                    string[] dateFields = { "CreateDate", "LastEditDate" };
                    string field = string.Empty;//存儲fieldname
                    foreach (PropertyInfo propertyInfo in properites)//遍歷數組
                    {
                        object val = row[propertyInfo.Name];
                        if (val != DBNull.Value)
                        {
                            model.Add(propertyInfo.Name, new SolrInputField(propertyInfo.Name, val));
                        }
                    }
                    docs.Add(model);

                    StartId = Convert.ToInt32(row["ID"]);
                    var result = solr.Update("/update", new UpdateOptions() { Docs = docs });
                }
            });

            //GetStartId();
            lock (syncLock)
            {
                if (currentIndex <= productsCount)
                {
                    form.fullerProgressBar.Value = (int)currentIndex;
                }
                form.fullerTsslCurrentNum.Text = currentIndex.ToString();
            }
        }

        /// <summary>
        /// 全部索引運行
        /// </summary>
        public void FullerRun()
        {
            //GetStartId();
            //form.fullerTsslCurrentNum.Text = currentIndex.ToString();
            DataTable dt = DataAccess.GetNextProductsInfo(StartId);
            StartId = AppCongfig.StartId;
            if (state == State.Break)
            {
                this.SendMesasge("完全索引已繼續,起始ID[" + StartId + "]...");
            }
            else
            {
                startTime = DateTime.Now;
                this.SendMesasge("完全索引已啟動,起始ID[" + StartId + "]...");
            }
            state = State.Runing;
            form.btnInitIndex.Enabled = false;
            form.btnSuspend.Enabled = true;
            form.btnStop.Enabled = true;
      
            while (dt != null && dt.Rows.Count > 0 && state == State.Runing)
            {
                try
                {
                    InitIndex(dt);//單線程
                   // CreateIndexer(dt);//多線程
                }
                catch (Exception ex)
                {
                    state = State.Stop;
                    form.btnInitIndex.Enabled = true;
                    form.btnSuspend.Enabled = false;
                    form.btnStop.Enabled = false;
                    GetStartId();
                    this.SendMesasge(ex.Message.ToString());
                }
                form.fullerTsslTimeSpan.Text = "已運行 :" + GetTimeSpanShow(DateTime.Now - startTime) + ",預計還需:" + GetTimeSpanForecast();

                try
                {
                    dt = DataAccess.GetNextProductsInfo(StartId);//獲取下一組產品
                }
                catch (Exception err)
                {
                    this.SendMesasge("獲取下一組產品出錯,起始ID[" + StartId + "]:" + err.Message);
                }
            }
            if (state == State.Runing)
            {
                state = State.Stop;
                form.btnInitIndex.Enabled = true;
                form.btnSuspend.Enabled = false;
                form.btnStop.Enabled = false;
                AppCongfig.SetValue("StartId", StartId.ToString());
                this.SendMesasge("完全索引已完成,總計索引數[" + currentIndex + "]結束的產品Id" + StartId);
            }
            else if (state == State.Break)
            {
                GetStartId();
                state = State.Break;
                form.btnInitIndex.Enabled = true;
                form.btnSuspend.Enabled = false;
                form.btnStop.Enabled = false;
                AppCongfig.SetValue("StartId", StartId.ToString());
                this.SendMesasge("完全索引已暫停,當前索引位置[" + currentIndex + "]結束的產品Id" + StartId);
            }
            else if (state == State.Stop)
            {
                GetStartId();
                state = State.Stop;
                this.SendMesasge("完全索引已停止,已索引數[" + currentIndex + "]結束的產品Id" + StartId);
                form.btnInitIndex.Enabled = true;
                form.btnSuspend.Enabled = false;
                form.btnStop.Enabled = false;
                AppCongfig.SetValue("StartId", StartId.ToString());
                productsCount = DataAccess.GetCount(StartId);       //產品數統計
                form.fullerTsslMaxNum.Text = productsCount.ToString();
                form.fullerProgressBar.Minimum = 0;
                form.fullerProgressBar.Maximum = productsCount;
            }
            endTime = DateTime.Now;
        }

        /// <summary>
        /// 多線程構建索引數據方法
        /// </summary>
        /// <param name="threadDataParam"></param>
        public void MultiThreadCreateIndex(object threadDataParam)
        {
            InitIndex(threadDataParam);
        }

        /// <summary>
        /// 獲取最大的索引id
        /// </summary>
        private void GetStartId()
        {
            IDictionary<string, ICollection<string>> options = new Dictionary<string, ICollection<string>>();
            options[CommonParams.SORT] = new string[] { "ProductID DESC" };
            options[CommonParams.START] = new string[] { "0" };
            options[CommonParams.ROWS] = new string[] { "1" };
            options[HighlightParams.FIELDS] = new string[] { "ProductID" };
            options[CommonParams.Q] = new string[] { "*:*" };
            var result = solrQuery.Query("/select", null, options);
            var solrDocumentList = (SolrDocumentList)result.Get("response");
            currentIndex = solrDocumentList.NumFound;
            if (solrDocumentList != null && solrDocumentList.Count() > 0)
            {
                StartId = (int)solrDocumentList[0]["ProductID"];
                //AppCongfig.SetValue("StartId", solrDocumentList[0]["ProductID"].ToString());
            }
            else
            {
                StartId = 0;
                // AppCongfig.SetValue("StartId", "0");
            }
        }


        /// <summary>
        /// 優化索引
        /// </summary>
        public void Optimize()
        {
            this.SendMesasge("開始優化索引,請耐心等待...");
            var result = solr.Update("/update", new UpdateOptions() { OptimizeOptions = optimizeOptions });
            var header = binaryResponseHeaderParser.Parse(result);
            this.SendMesasge("優化索引耗時:" + header.QTime + "毫秒");
        }

        /// <summary>
        /// 發送消息到界面
        /// </summary>
        /// <param name="message">發送消息到界面</param>
        protected void SendMesasge(string message)
        {
            form.fullerDgvMessage.Rows.Add(form.fullerDgvMessage.Rows.Count + 1, message, DateTime.Now.ToString());
        }
        /// <summary>
        /// 獲取時間間隔顯示
        /// </summary>
        /// <param name="ts">時間間隔</param>
        /// <returns></returns>
        protected string GetTimeSpanShow(TimeSpan ts)
        {
            string text = "";
            if (ts.Days > 0)
            {
                text += ts.Days + "";
            }
            if (ts.Hours > 0)
            {
                text += ts.Hours + "";
            }
            if (ts.Minutes > 0)
            {
                text += ts.Minutes + "";
            }
            if (ts.Seconds > 0)
            {
                text += ts.Seconds + "";
            }
            return text;
        }
        /// <summary>
        /// 獲取預測時間
        /// </summary>
        /// <returns></returns>
        protected string GetTimeSpanForecast()
        {
            if (currentIndex != 0)
            {
                TimeSpan tsed = DateTime.Now - startTime;
                double d = ((tsed.TotalMilliseconds / currentIndex) * productsCount) - tsed.TotalMilliseconds;
                return GetTimeSpanShow(TimeSpan.FromMilliseconds(d));
            }
            return "";
        }
    }
}

 

    (5)運行索引器,創建索引,這里是我的索引器界面,如圖

 

   可以隨時跟蹤索引生成的情況

  (6)索引創建完畢后,可以進入Solr服務器界面http://localhost:8080/one/#/collection1/query進行測試

 

以上就是Solr的前期工作,主要是Solr服務器搭建和客戶端調用生成索引,后期再對客戶端的查詢進行詳細的說明,下期預告

1.全文搜索,分詞配置,以及類似於谷歌和百度那種輸入關鍵字自動完成功能

2.Facet查詢

 

 

 

 

    


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM