最近看solr出了4.0ALPHA版本,管理界面比3.x漂亮,而且在和mmseg和lucene的SmartChineseAnalyzer、StandardAnalyzer、CJKAnalyzer比較之后,感覺IKAnalyzer比較好用!在配置IKAnalyzer的時候發現有些接口已經改變了,所以根據啟動時出現的錯誤進行修改,所以有了4.0版本,已經測試可用!
如下為IKAnalyzer的新目錄結構
IKAnalyzer4.0的jar包 ==>下載
解壓后把IKAnalyzer4.0.jar,IKAnalyzer.cfg,stopword.dic放到solr目錄下的lib中
編輯solrconfig.xml添加
<lib dir="http://www.cnblogs.com/dist/" regex="apache-solr-analysis-extras-\d.*\.jar" /> <lib dir="http://www.cnblogs.com/contrib/analysis-extras/lucene-libs" regex=".*\.jar" />
編輯schema.xml添加
<!-- IKAnalyzer --> <fieldType name="text_ik" class="solr.TextField" > <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/> <analyzer type="index"> <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" isMaxWordLength="false"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>
順便也貼下SmartChineseAnalyzer的配置
<!-- Chinese --> <fieldType name="text_zh-cn" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.SmartChineseSentenceTokenizerFactory"/> <filter class="solr.SmartChineseWordTokenFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PositionFilterFactory" /> <filter class="solr.StopFilterFactory" ignoreCase="false" words="lang/stopwords_zh-cn.txt" enablePositionIncrements="true"/> </analyzer> </fieldType>
如果有什么問題請指出,跟大家一起學習進步!