elasticsearch 拼音搜索

本文轉載自查看原文 2017-04-11 10:07 2016 elasticsearch

現在很多公司都開始使用es來做搜索，我們公司目前也有好幾個業務部門在用，我主要做商戶搜索，為業務部門提供基礎支持。上周把呼叫中心的搜索重新整理了下，在新增幾個字段后，全量同步發現通過拼音首字母搜索無法搜索出來了，最后發現是詞庫地址變更，導致分詞出現了問題。

我整理了下es的搜索分詞插件和流程，如下：

1. 下載安裝分詞插件 https://github.com/medcl/elasticsearch-analysis-ik

修改 IKAnalyzer.cfg.xml 配置加載自己的遠程擴展詞庫，我的詞庫由於一次機房遷移導致地址失效了，但是一直都沒有發現是因為大部分商戶數據並沒有更新，分詞索引必須要在數據更新時才會被重建！

2. 下載安裝拼音插件 https://github.com/medcl/elasticsearch-analysis-pinyin

創建索引

curl -XPUT http://127.0.0.1:9200/demo/ -d'{
"settings" : {
      "index" : {
          "analysis": {
             "analyzer": {
                 "ik_smart_pinyin": {
                     "tokenizer": "ik_smart",
                     "filter": [
                         "my_pinyin",
                         "lowercase",
                         "word_delimiter"
                     ]
                 },
                 "ik_max_word_pinyin": {
                     "tokenizer": "ik_max_word",
                     "filter": [
                         "my_pinyin",
                         "lowercase",
                         "word_delimiter"
                     ]
                 }
             },
             "tokenizer": {
                 "ik_smart": {
                     "type": "ik_smart",
                     "use_smart": "true"
                 },
                 "ik_max_word": {
                     "type": "ik_max_word",
                     "use_smart": "false"
                 }
             },
             "filter": {
                 "my_pinyin": {
                     "type": "pinyin",
                     "first_letter": "all"
                 }
             }
         }
      }
}}'

curl -XPUT http://127.0.0.1:9200/_analyze?analyzer=ik_smart_pinyin&text=望湘園

{
    "tokens": [
        {
            "token": "wang",
            "start_offset": 0,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 0
        },
        {
            "token": "xiang",
            "start_offset": 0,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 1
        },
        {
            "token": "yuan",
            "start_offset": 0,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "wxy",
            "start_offset": 0,
            "end_offset": 3,
            "type": "CN_WORD",
            "position": 3
        }
    ]
}

"token": "wxy" 就是首字母

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 ElasticSearch中實現拼音搜索 Elasticsearch高級搜索排序（中文+拼音+首字母+簡繁轉換+特殊符號過濾） ElasticSearch搜索引擎安裝配置拼音插件pinyin iOS拼音搜索，拼音首字母搜索 jquery拼音轉漢字搜索從零搭建 ES 搜索服務（四）拼音搜索 elasticsearch 拼音+ik分詞，spring data elasticsearch 拼音分詞 Elasticsearch中文分詞加拼音 ElasticSearch安裝拼音插件（pinyin） elasticsearch插件安裝之--拼音插件