/**
* vm12下的centos7.2
* elasticsearch 5.2.2
*/
有時在淘寶搜索商品的時候, 會發現使用漢字, 拼音, 或者拼音混合漢字都會出來想要的搜索結果, 今天找了一下, 是通過拼音搜索插件實現的:
1), ik的安裝之前已經講過, 不在贅述
2), es2.4版本的安裝非常簡單, 和ik挺像, 最后在elasticsearch.yml中加上分詞配置即可, 也不再說..
原博客: http://blog.csdn.net/hhl2046/article/details/53319637
index: analysis: analyzer: ik: alias: [news_analyzer_ik,ik_analyzer] type: org.elasticsearch.index.analysis.IkAnalyzerProvider ik_analyzer_pinyin: //分詞器名稱 type: custom // custom表示自己定制 tokenizer: ik // 分割詞源的組建, ik filter: [synonym_test_filter,pinyin_mcl] // 對分隔的詞源做處理 拼音和同義詞 filter: synonym_test_filter: type: synonym_filter synonyms_path: synonym.txt dynamic_reload: true reload_interval: 10s expand: true pinyin_mcl: type: pinyin first_letter: none padding_char: ""
ik: https://github.com/medcl/elasticsearch-analysis-ik
拼音分詞器: https://github.com/medcl/elasticsearch-analysis-pinyin
然后, 5.2.2版本 拼音分詞 的安裝:
1, 下載
https://github.com/medcl/elasticsearch-analysis-pinyin
mvn package
打包成功后, 在 target/releases 下, 可以找到 elasticsearch-analysis-ik-5.2.2.zip
2, 將打包后的zip文件放在 {ES_HOME}/plugins/pinyin/ 目錄下, 並解壓根目錄
3, 測試:
curl -XPUT http://localhost:9200/medcl/ -d'
{
"index" : {
"analysis" : {
"analyzer" : {
"pinyin_analyzer" : {
"tokenizer" : "my_pinyin"
}
},
"tokenizer" : {
"my_pinyin" : {
"type" : "pinyin",
"keep_separate_first_letter" : false,
"keep_full_pinyin" : true,
"keep_original" : true,
"limit_first_letter_length" : 16,
"lowercase" : true,
"remove_duplicated_term" : true
}
}
}
}
}'
http://localhost:9200/medcl/_analyze?text=%e5%88%98%e5%be%b7%e5%8d%8e&analyzer=pinyin_analyzer
分詞結果為:
{
"tokens" : [
{
"token" : "liu",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 0
},
{
"token" : "de",
"start_offset" : 1,
"end_offset" : 2,
"type" : "word",
"position" : 1
},
{
"token" : "hua",
"start_offset" : 2,
"end_offset" : 3,
"type" : "word",
"position" : 2
},
{
"token" : "劉德華",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 3
},
{
"token" : "ldh",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 4
}
]
}
4, 配置 IK + pinyin 分詞配置
settings設置:
curl -XPUT "http://localhost:9200/medcl/" -d' { "index": { "analysis": { "analyzer": { "ik_pinyin_analyzer": { "type": "custom", "tokenizer": "ik_smart", "filter": ["my_pinyin", "word_delimiter"] } }, "filter": { "my_pinyin": { "type": "pinyin", "first_letter": "prefix", "padding_char": " " } } } } }'
創建mapping:
curl -XPOST http://localhost:9200/medcl/folks/_mapping -d' { "folks": { "properties": { "name": { "type": "keyword", "fields": { "pinyin": { "type": "text", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "ik_pinyin_analyzer", "boost": 10 } } } } } }'
添加測試文檔:
curl -XPOST http://localhost:9200/medcl/folks/andy -d'{"name":"劉德華"}'
curl -XPOST http://localhost:9200/medcl/folks/tina -d'{"name":"中華人民共和國國歌"}'
測試分詞效果:
拼音分詞效果:
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:liu" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:de" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:hua" curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:ldh"
ik分詞測試:
curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d' { "query": { "match": { "name.pinyin": "國歌" } }, "highlight": { "fields": { "name.pinyin": {} } } }'
ik + pinyin
curl -XPOST "http://localhost:9200/medcl/_search?pretty" -d' { "query": { "match": { "name.pinyin": "zhonghua" } }, "highlight": { "fields": { "name.pinyin": {} } } }'
參照: http://blog.csdn.net/napoay/article/details/53907921
http://www.jianshu.com/p/653f7b33e63c
https://github.com/medcl/elasticsearch-analysis-pinyin
https://my.oschina.net/xiaohui249/blog/214505
