/**
* vm12下的centos7.2
* elasticsearch 5.2.2
*/
有時在淘寶搜索商品的時候, 會發現使用漢字, 拼音, 或者拼音混合漢字都會出來想要的搜索結果, 今天找了一下, 是通過拼音搜索插件實現的:
1), ik的安裝之前已經講過, 不在贅述
2), es2.4版本的安裝非常簡單, 和ik挺像, 最后在elasticsearch.yml中加上分詞配置即可, 也不再說..
原博客: http://blog.csdn.net/hhl2046/article/details/53319637
index: analysis: analyzer: ik: alias: [news_analyzer_ik,ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider ik_analyzer_pinyin: //分詞器名稱 type: custom // custom表示自己定制 tokenizer: ik // 分割詞源的組建, ik filter: [synonym_test_filter,pinyin_mcl] // 對分隔的詞源做處理 拼音和同義詞 filter: synonym_test_filter: type: synonym_filter synonyms_path: synonym.txt dynamic_reload: true reload_interval: 10s expand: true pinyin_mcl: type: pinyin first_letter: none padding_char: ""
ik: https://github.com/medcl/elasticsearch-analysis-ik
拼音分詞器: https://github.com/medcl/elasticsearch-analysis-pinyin
然后, 5.2.2版本 拼音分詞 的安裝:
1, 下載
https://github.com/medcl/elasticsearch-analysis-pinyin
mvn package
打包成功后, 在 target/releases 下, 可以找到 elasticsearch-analysis-ik-5.2.2.zip
2, 將打包后的zip文件放在 {ES_HOME}/plugins/pinyin/ 目錄下, 並解壓根目錄
3, 測試:
curl -XPUT http://localhost:9200/medcl/ -d' { "index" : { "analysis" : { "analyzer" : { "pinyin_analyzer" : { "tokenizer" : "my_pinyin" } }, "tokenizer" : { "my_pinyin" : { "type" : "pinyin", "keep_separate_first_letter" : false, "keep_full_pinyin" : true, "keep_original" : true, "limit_first_letter_length" : 16, "lowercase" : true, "remove_duplicated_term" : true } } } } }'
http://localhost:9200/medcl/_analyze?text=%e5%88%98%e5%be%b7%e5%8d%8e&analyzer=pinyin_analyzer
分詞結果為:
{ "tokens" : [ { "token" : "liu", "start_offset" : 0, "end_offset" : 1, "type" : "word", "position" : 0 }, { "token" : "de", "start_offset" : 1, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "hua", "start_offset" : 2, "end_offset" : 3, "type" : "word", "position" : 2 }, { "token" : "劉德華", "start_offset" : 0, "end_offset" : 3, "type" : "word", "position" : 3 }, { "token" : "ldh", "start_offset" : 0, "end_offset" : 3, "type" : "word", "position" : 4 } ] }
4, 配置 IK + pinyin 分詞配置
settings設置:
curl -XPUT "http://localhost:9200/medcl/" -d' { "index": { "analysis": { "analyzer": { "ik_pinyin_analyzer": { "type": "custom", "tokenizer": "ik_smart", "filter": ["my_pinyin", "word_delimiter"] } }, "filter": { "my_pinyin": { "type": "pinyin", "first_letter": "prefix", "padding_char": " " } } } } }'
創建mapping:
curl -XPOST http://localhost:9200/medcl/folks/_mapping -d' { "folks": { "properties": { "name": { "type": "keyword", "fields": { "pinyin": { "type": "text", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "ik_pinyin_analyzer", "boost": 10 } } } } } }'
添加測試文檔:
curl -XPOST http://localhost:9200/medcl/folks/andy -d'{"name":"劉德華"}'
curl -XPOST http://localhost:9200/medcl/folks/tina -d'{"name":"中華人民共和國國歌"}'
測試分詞效果:
拼音分詞效果:
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:liu" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:de" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:hua" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:ldh"
ik分詞測試:
curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d' { "query": { "match": { "name.pinyin": "國歌" } }, "highlight": { "fields": { "name.pinyin": {} } } }'
ik + pinyin
curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d' { "query": { "match": { "name.pinyin": "zhonghua" } }, "highlight": { "fields": { "name.pinyin": {} } } }'
參照: http://blog.csdn.net/napoay/article/details/53907921
http://www.jianshu.com/p/653f7b33e63c
https://github.com/medcl/elasticsearch-analysis-pinyin
https://my.oschina.net/xiaohui249/blog/214505