從零搭建 ES 搜索服務（四）拼音搜索

本文轉載自查看原文 2019-03-08 16:56 608

一、前言

上篇介紹了 ES 的同義詞搜索，使我們的搜索更強大了，然而這還遠遠不夠，在實際使用中還可能希望搜索「fanqie」能將包含「番茄」的結果也羅列出來，這就涉及到拼音搜索了，本篇將介紹如何具體實現。

二、安裝 ES 拼音插件

2.1 拼音插件簡介

GitHub 地址：https://github.com/medcl/elasticsearch-analysis-pinyin

2.2 安裝步驟

① 進入 ES 的 bin 目錄

$ cd /usr/local/elasticsearch/bin/

② 通過 elasticsearch-plugin 命令安裝 pinyin 插件

$ ./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v5.5.3/elasticsearch-analysis-pinyin-5.5.3.zip

③ 安裝成功后會在 plugins 目錄出現 analysis-pinyin 文件夾

三、自定義分析器

要使用「拼音插件」需要在創建索引時使用「自定義模板」並在自定義模板中「自定義分析器」。

3.1 具體配置

① 在上篇新建的「 yb_knowledge.json 」模板中修改「 setting 」配置，往其中添加自定義分析器

"analysis": {
    "filter": {
        ...省略其余部分...
        "pinyin_filter":{
            "type": "pinyin",
            "keep_first_letter": true,
            "keep_separate_first_letter": false,
            "keep_full_pinyin": true,
            "keep_joined_full_pinyin": true,                     
            "none_chinese_pinyin_tokenize": false,
            "keep_joined_full_pinyin": true,
            "remove_duplicated_term": true,
            "keep_original": true,
            "limit_first_letter_length": 50,
            "lowercase": true
        }
    },
    "analyzer": {
        ...省略其余部分...
        "ik_synonym_pinyin": {
            "type": "custom",
            "tokenizer": "ik_smart",
            "filter": ["synonym_filter","pinyin_filter"]
        }
    }
}

自定義分析器說明：

首先聲明一個新「 token filter 」—— 「 pinyin_filter 」，其中 type 為 pinyin 即拼音插件，其余字段詳見 GitHub 項目說明。
其次聲明一個新「analyzer」—— 「ik_synonym_pinyin」，其中 type 為 custom 即自定義類型， tokenizer 為 ik_smart 即使用 ik 分析器的 ik_smart 分詞模式， filter 為要使用的詞過濾器，可以使用多個，這里使用了上述定義的 pinyin_filter 以及前篇的 synonym_filter 。

② 與此同時修改「 mappings 」中的 properties 配置，往「 knowledgeTitle 」及「 knowledgeContent 」這兩個搜索字段里添加 fields 參數，它支持以不同方式對同一字段做索引，將原本的簡單映射轉化為多字段映射，此處設置一個名為「 pinyin 」的嵌套字段且使用上述自定義的「 ik_synonym_pinyin 」作為分析器。

"mappings": {
    "knowledge": {
        ...省略其余部分...
        "properties": {
            ...省略其余部分...
            "knowledgeTitle": {
                    "type": "text",
                    "analyzer": "ik_synonym_max",
                    "fields":{
                        "pinyin": {
                            "type":"text",
                            "analyzer": "ik_synonym_pinyin"
                        }
                    }
                },
                "knowledgeContent": {
                    "type": "text",
                    "analyzer": "ik_synonym_max",
                    "fields":{
                        "pinyin": {
                            "type":"text",
                            "analyzer": "ik_synonym_pinyin"
                        }
                    }
                }
        }
    }
}

③ 最后刪除先前創建的 yb_knowledge 索引並重啟 Logstash

注：重建索引后可以通過「_analyze」測試分詞結果

curl -XGET http://localhost:9200/yb_knowledge/_analyze
{
    "analyzer":"ik_synonym_pinyin",
    "text":"番茄"
}

注：在添加了同義詞「番茄、西紅柿、聖女果」的基礎上分詞結果如下

{
    "tokens": [
        {
            "token": "fan",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 0
        },
        {
            "token": "番茄",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 0
        },
        {
            "token": "fanqie",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 0
        },
        {
            "token": "fq",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 0
        },
        {
            "token": "qie",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 1
        },
        {
            "token": "xi",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 2
        },
        {
            "token": "hong",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 3
        },
        {
            "token": "shi",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 4
        },
        {
            "token": "西紅柿",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 4
        },
        {
            "token": "xihongshi",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 4
        },
        {
            "token": "xhs",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 4
        },
        {
            "token": "sheng",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 5
        },
        {
            "token": "nv",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 6
        },
        {
            "token": "guo",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 7
        },
        {
            "token": "聖女果",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 7
        },
        {
            "token": "shengnvguo",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 7
        },
        {
            "token": "sng",
            "start_offset": 0,
            "end_offset": 2,
            "type": "SYNONYM",
            "position": 7
        }
    ]
}

四、結語

至此拼音搜索已經實現完畢，最近兩篇都是有關 ES 插件以及 Logstash 自定義模板的配置，沒有涉及具體的 JAVA 代碼實現，下一篇將介紹如何通過 JAVA API 實現搜索結果高亮。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 從零搭建 ES 搜索服務（五）搜索結果高亮從零搭建 ES 搜索服務（三）同義詞搜索從零搭建ES搜索服務（一）基本概念及環境搭建 elasticsearch 拼音搜索 ES基本搜索（1） iOS拼音搜索，拼音首字母搜索 ES 拼音前綴匹配的一種方案 ES搜索結果調優 ES（elasticsearch）搜索概述 ES搜索引擎集群模式搭建【Kibana可視化】