ElasticSearch（二十六）修改分詞器及定制自己的分詞器

本文轉載自查看原文 2019-05-22 23:57 1034 ElasticSearch

1、默認的分詞器

standard 分詞器

standard tokenizer：以單詞邊界進行切分
standard token filter：什么都不做
lowercase token filter：將所有字母轉換為小寫
stop token filer（默認被禁用）：移除停用詞，比如a the it等等

2、修改分詞器的設置

啟用english停用詞token filter

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "es_std": {
          "type": "standard",
          "stopwords": "_english_"
        }
      }
    }
  }
}

GET /my_index/_analyze
{
  "analyzer": "standard", 
  "text": "a dog is in the house"
}

GET /my_index/_analyze
{
  "analyzer": "es_std",
  "text":"a dog is in the house"
}

3、定制化自己的分詞器

1.&字符轉換

2.停用某些詞

3.大小寫轉換

PUT /my_index
{
  "settings": {
    "analysis": {
      "char_filter": {
        "&_to_and": {
          "type": "mapping",
          "mappings": ["&=> and"]
        }
      },
      "filter": {
        "my_stopwords": {
          "type": "stop",
          "stopwords": ["the", "a"]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "char_filter": ["html_strip", "&_to_and"],
          "tokenizer": "standard",
          "filter": ["lowercase", "my_stopwords"]
        }
      }
    }
  }
}

GET /my_index/_analyze
{
  "text": "tom&jerry are a friend in the house, <a>, HAHA!!",
  "analyzer": "my_analyzer"
}

PUT /my_index/_mapping/my_type
{
  "properties": {
    "content": {
      "type": "text",
      "analyzer": "my_analyzer"
    }
  }
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 ElasticSearch 分詞器 ElasticSearch 分詞器 ElasticSearch 分詞器 elasticsearch分詞器 Elasticsearch 分詞器 elasticsearch之ik分詞器 elasticsearch - ik分詞器 ElasticSearch（四）查詢、分詞器 elasticsearch 安裝分詞器 Elasticsearch IK分詞器