elasticsearch之phrase suggester


目錄

    詞組建議器和詞條建議器一樣,不過它不再為單個詞條提供建議,而是為整個文本提供建議。
    准備數據:

    PUT s4
    {
      "mappings": {
        "doc": {
          "properties": {
            "title": {
              "type": "text",
              "analyzer": "standard"
            }
          }
        }
      }
    }
    
    PUT s4/doc/1
    {
      "title": "Lucene is cool"
    }
    
    PUT s4/doc/2
    {
      "title": "Elasticsearch builds on top of lucene"
    }
    
    PUT s4/doc/3
    {
      "title": "Elasticsearch rocks"
    }
    
    PUT s4/doc/4
    {
      "title": "Elastic is the company behind ELK stack"
    }
    
    PUT s4/doc/5
    {
      "title": "elk rocks"
    }
    
    PUT s4/doc/6
    {
      "title": "elasticsearch is rock solid"
    }
    

    現在我們來看看phrase是如何建議的:

    GET s4/doc/_search
    {
      "suggest": {
        "my_s4": {
          "text": "lucne and elasticsear rock",
          "phrase": {
            "field": "title"
          }
        }
      }
    }
    

    text是輸入帶有拼錯的文本。而建議類型則換成了phrase。來看查詢結果:

    {
      "took" : 6,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 0,
        "max_score" : 0.0,
        "hits" : [ ]
      },
      "suggest" : {
        "my_s4" : [
          {
            "text" : "lucne and elasticsear rock",
            "offset" : 0,
            "length" : 26,
            "options" : [
              {
                "text" : "lucne and elasticsearch rocks",
                "score" : 0.12709484
              },
              {
                "text" : "lucne and elasticsearch rock",
                "score" : 0.10422645
              },
              {
                "text" : "lucne and elasticsear rocks",
                "score" : 0.10036137
              }
            ]
          }
        ]
      }
    }
    

    可以看到options直接返回了相關短語列表。雖然lucene建議的並不好。但elasticserchrock很不錯。除此之外,我們還可以使用高亮來向用戶展示哪些原有的詞條被糾正了。

    GET s4/doc/_search
    {
      "suggest": {
        "my_s4": {
          "text": "lucne and elasticsear rock",
          "phrase": {
            "field": "title",
            "highlight":{
              "pre_tag":"<em>",
              "post_tag":"</em>"
            }
          }
        }
      }
    }
    

    除了默認的,還可以自定義高亮顯示:

    GET s4/doc/_search
    {
      "suggest": {
        "my_s4": {
          "text": "lucne and elasticsear rock",
          "phrase": {
            "field": "title",
            "highlight":{
              "pre_tag":"<b id='d1' class='t1' style='color:red;font-size:18px;'>",
              "post_tag":"</b>"
            }
          }
        }
      }
    }
    

    需要注意的是,建議器結果的高亮顯示和查詢結果高亮顯示有些許區別,比如說,這里的自定義標簽是pre_tagpost_tag而不是之前如這樣的:

    GET s4/doc/_search
    {
      "query": {
        "match": {
          "title": "rock"
        }
      },
      "highlight": {
        "pre_tags": "<b style='color:red'>",
        "post_tags": "</b>",
        "fields": {
          "title": {}
        }
      }
    }
    

    phrase suggesterterm suggester的基礎上,會考慮多個term之間的關系,比如是否同時出現索引的原文中,臨近程度,詞頻等。


    see also:[phrase suggester](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html) 歡迎斧正,that's all


    免責聲明!

    本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



     
    粵ICP備18138465號   © 2018-2025 CODEPRJ.COM