由於solr5.3.1本身不支持中文分詞,而msseg4j的分詞效果不明顯。因而采用IK進行分詞,然而參考http://www.superwu.cn/2015/05/08/2134/在google上下載的jar包放到solr目錄下直接報如下異常。
嚴重: Servlet.service() for servlet [default] in context with path [/solr] threw exception [Filter execution threw an exception] with root cause java.lang.AbstractMethodError at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:179) at org.apache.solr.handler.AnalysisRequestHandlerBase.analyzeValue(AnalysisRequestHandlerBase.java:91) at org.apache.solr.handler.FieldAnalysisRequestHandler.analyzeValues(FieldAnalysisRequestHandler.java:221) at org.apache.solr.handler.FieldAnalysisRequestHandler.handleAnalysisRequest(FieldAnalysisRequestHandler.java:182) at org.apache.solr.handler.FieldAnalysisRequestHandler.doAnalysis(FieldAnalysisRequestHandler.java:102) at org.apache.solr.handler.AnalysisRequestHandlerBase.handleRequestBody(AnalysisRequestHandlerBase.java:63) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2068) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:669) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:462) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:423) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1079) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.doRun(AprEndpoint.java:2522) at org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:2511) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) at java.lang.Thread.run(Thread.java:745)
一開始以為是配置問題,怎么配都不行。后來看了下源碼,發現solr5.3.1中 Luecene的Analyzer接口的createComponents方法把第二個參數去掉了。因此修改源碼是在所難免了。源碼的修改可參考:http://iamyida.iteye.com/blog/2193513。也可以直接獲取改好的源碼重新打包即可。
主要修改部分、IKAnalyzer.java
/** * 重載Analyzer接口,構造分詞組件 */ @Override protected TokenStreamComponents createComponents(String text) { Reader reader = new BufferedReader(new StringReader(text)); Tokenizer _IKTokenizer = new IKTokenizer(reader , this.useSmart()); return new TokenStreamComponents(_IKTokenizer); }
IKTokenizer.java中添加如下構造函數
public IKTokenizer(AttributeFactory factory, boolean useSmart) { super(factory); offsetAtt = addAttribute(OffsetAttribute.class); termAtt = addAttribute(CharTermAttribute.class); typeAtt = addAttribute(TypeAttribute.class); _IKImplement = new IKSegmenter(input , useSmart); }
其它都是一些零零碎碎的修改。可查看修改后的源文件。
新建一個工程(附件中的IK-Analyzer-extra),添加工廠類IKTokenizerFactory,方便程序的擴展和維護。
package org.wltea.analyzer.util; import java.util.Map; import org.apache.lucene.analysis.Tokenizer; import org.apache.lucene.analysis.util.TokenizerFactory; import org.apache.lucene.util.AttributeFactory; import org.wltea.analyzer.lucene.IKTokenizer; public class IKTokenizerFactory extends TokenizerFactory { private boolean useSmart; public IKTokenizerFactory(Map<String, String> args) { super(args); useSmart = getBoolean(args, "useSmart", false); } @Override public Tokenizer create(AttributeFactory attributeFactory) { Tokenizer tokenizer = new IKTokenizer(attributeFactory,useSmart); return tokenizer; } }
最后是schema.xml中添加如下配置
<fieldType name="text_ik" class="solr.TextField"> <!--索引時候的分詞器--> <analyzer type="index"> <tokenizer class="org.wltea.analyzer.util.IKTokenizerFactory" useSmart="true"/> </analyzer> <!--查詢時候的分詞器--> <analyzer type="query"> <tokenizer class="org.wltea.analyzer.util.IKTokenizerFactory" useSmart="false"/> </analyzer> </fieldType>
最后將IK-Analyzer-5.3.1.jar和IK-Analyzer-extra-5.3.1.jar拷貝至solr項目的lib目錄下即可。
另外提醒下各位,IK的源碼已經搬遷至這了:http://git.oschina.net/wltea/IK-Analyzer-2012FF/。
工程文件:
http://pan.baidu.com/s/1skv1jCp
http://pan.baidu.com/s/1c1o0gI8
參考文獻:
http://iamyida.iteye.com/blog/2220474
http://iamyida.iteye.com/blog/2193513