Elasticsearch復雜搜索(排序、分頁、高亮、模糊查詢、精確查詢)


如果不了解Es的基本使用,可以查看之前的文章。Elasticsearch 索引及文檔的基本操作

在查詢之前可以使用Bulk API 批量插入文檔數據 數據來源

查詢數據

match query

match會使用分詞器解析!先分析文檔,然后再通過分析的文檔進行查詢。

GET /student/_search
{
  "query": {
    "match": {
      "name": "山西"
    }
  }
}

上面的搜索也可以這么實現

GET /student/_search?q=name:"山西"

查詢結果展示有三個名字中包含 “山西” 的學生:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.7133499,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.7133499,
        "_source" : {
          "name" : "山西太原-張三",
          "age" : "23",
          "address" : {
            "city" : "太原",
            "province" : "山西"
          }
        }
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.7133499,
        "_source" : {
          "name" : "山西長治-李四",
          "age" : "24",
          "address" : {
            "city" : "長治",
            "province" : "山西"
          }
        }
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.7133499,
        "_source" : {
          "name" : "山西呂梁-王五",
          "age" : "25",
          "address" : {
            "city" : "呂梁",
            "province" : "山西"
          }
        }
      }
    ]
  }
}

描述

query : 表示查詢。

match : 要匹配的條件信息。

name :要查詢的信息

hits --> total

  • value : 查詢出兩條數據
  • ralation : 關系是 eq,相等

max_source : 最大分值

hits : 索引和文檔的信息,查詢出來的結果總數,就是查詢出來的具體文檔。

我們可以根據每個文檔的 _source 來判斷那條數據更加符合預期結果。

在使用mutch查詢時,默認的操作是 OR,下面兩個查詢的結果是相同的:

GET student/_search
{
    "query": {
        "match": {
            "name": {
                "query": "山西長治",
                "operator": "or"
            }
        }
    }
}
GET student/_search
{
    "query": {
        "match": {
            "name": "山西長治"
        }
    }
}

因為在使用mutch操作時,operator 默認值為 OR,上面的查詢為只要任何文檔匹配 :山西長治 其中任何一個字將被顯示。

可以通過設置 minimum_should_match 參數來設置至少匹配的term,比如:

GET student/_search
{
    "query": {
        "match": {
            "name": {
                "query": "山西長治",
                "operator": "or",
                "minimum_should_match": 3
            }
        }
    }
}

只有匹配到 山西長治 這四個字其中的三個字的文檔才會被顯示。

改為 and 之后,只有一個文檔會被查詢到:

GET student/_search
{
  "query": {
    "match": {
      "name": {
        "query": "山西長治",
        "operator": "and"
      }
    }
  }
}

Ids query

使用多個id批量查詢文檔

GET student/_search
{
  "query": {
    "ids": {
      "values": [1,2,3]
    }
  }
}

上面的查詢將返回 id 為 1,2,3的文檔。

multi_match

multi_match 查詢建立在 match 查詢的基礎上,允許多字段查詢。

在上面的搜索中,通過指定一個 field 來進行搜索。在很多情況下,並不知道那個 field 含有要查詢的關鍵字,這種情況就可以使用 multi_match 來查詢。

GET student/_search
{
    "query": {
        "multi_match": {
            "query": "山西長治",
            "fields": [
                "name",
                "address.city^3",
                "address.province"
            ],
            "type": "best_fields"
        }
    }
}

將field:name、city、province 進行檢索,並對 city 中含有 山西長治 的文檔的分數進行三倍加權。返回結果為:

{
    "took" : 0,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 3,
            "relation" : "eq"
        },
        "max_score" : 7.223837,
        "hits" : [
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "2",
                "_score" : 7.223837,
                "_source" : {
                    "name" : "山西長治-李四",
                    "age" : "24",
                    "address" : {
                        "city" : "長治",
                        "province" : "山西"
                    }
                }
            },
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 0.7133499,
                "_source" : {
                    "name" : "山西太原-張三",
                    "age" : "23",
                    "address" : {
                        "city" : "太原",
                        "province" : "山西"
                    }
                }
            },
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 0.7133499,
                "_source" : {
                    "name" : "山西呂梁-王五",
                    "age" : "25",
                    "address" : {
                        "city" : "呂梁",
                        "province" : "山西"
                    }
                }
            }
        ]
    }
}

Prefix query

返回在提供的字段中返回包含特定前綴的文檔

GET student/_search
{
    "query": {
        "prefix": {
            "address.city": {
                "value": "呂"
            }
        }
    }
}

查詢城市開頭為 的文檔

{
    "took" : 2,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 1.0,
                "_source" : {
                    "name" : "山西呂梁-王五",
                    "age" : "25",
                    "address" : {
                        "city" : "呂梁",
                        "province" : "山西"
                    }
                }
            }
        ]
    }
}

Term query

term 會在給定字段中進行精確的字段匹配,因此需要提供准確的查詢條件以獲取正確的結果

GET /student/_search
{
    "query": {
        "term": {
            "name.keyword": "山西太原-張三"
        }
    }
}

這里使用 name.keyword 來對 "山西太原-張三" 這個條件進行精確查詢匹配文檔:

{
    "took" : 0,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 1.2039728,
        "hits" : [
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 1.2039728,
                "_source" : {
                    "name" : "山西太原-張三",
                    "age" : "23",
                    "address" : {
                        "city" : "太原",
                        "province" : "山西"
                    }
                }
            }
        ]
    }
}

Terms query

如果想用對個值進行精確查詢,可以使用terms進行查詢。類似於 SQL中的 in 語法

GET student/_search
{
    "query": {
        "terms": {
            "address.city.keyword": [
                "長治",
                "廣州"
            ]
        }
    }
}

上面的查詢結果將展示 address.city.keyword 里含有 長治和廣州 的所有文檔。

復合查詢

復合查詢是將上面的單個查詢組合起來形成更復雜的查詢。

一般格式為:

POST _search
{
    "query": {
        "bool" : {
            "must" : {
                "term" : { "user" : "kimchy" }
            },
            "filter": {
                "term" : { "tag" : "tech" }
            },
            "must_not" : {
                "range" : {
                    "age" : { "gte" : 10, "lte" : 20 }
                }
            },
            "should" : [
                { "term" : { "tag" : "wow" } },
                { "term" : { "tag" : "elasticsearch" } }
            ],
            "minimum_should_match" : 1,
            "boost" : 1.0
        }
    }
}

復合查詢是由 bool 下面的 must filter must_not should 組成,並且可以通過 minimum_should_match 來指定文檔必須匹配的數量或者百分比。如果布爾查詢包含至少一個 should 子句,並且沒有 must 或 filter 子句,則默認值為1。否則,默認值為0。

must

must 相當於SQL中的 and 操作。

使用復合查詢城市為長治,年齡為24的文檔數據

GET student/_search
{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "address.city": "長治"
                    }
                },
                {
                    "match": {
                        "age": "24"
                    }
                }
            ]
        }
    }
}

must_not

查詢所有省份不在山西的文檔,返回結果只剩下了一個廣州:

GET student/_search
{
    "query": {
        "bool": {
            "must_not": [
                {
                    "match": {
                        "address.province": "山西"
                    }
                }
            ]
        }
    }
}

filter

使用filter過濾年齡在24~25之間的文檔

GET student/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "age": {
              "gte": 24,
              "lte": 25
            }
          }
        }
      ]
    }
  }
}
  • gt : 大於
  • gte : 大於等於
  • lt:小於
  • lte:小於等於

should

should 表示或的意思,相當於SQL中的 OR。

查詢省份在山西的文檔,如果name含有張三,相關性會更高,搜索結果會靠前。

GET student/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "address.province": "山西"
          }
        }
      ],
      "should": [
        {
          "match_phrase": {
            "name": "李四"
          }
        }
      ]
    }
  }
}

返回結果可以看到 name為 山西長治-李四 的文檔排在最前:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 3.1212955,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 3.1212955,
        "_source" : {
          "name" : "山西長治-李四",
          "age" : "24",
          "address" : {
            "city" : "長治",
            "province" : "山西"
          }
        }
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.7133499,
        "_source" : {
          "name" : "山西太原-張三",
          "age" : "23",
          "address" : {
            "city" : "太原",
            "province" : "山西"
          }
        }
      },
      {
        "_index" : "student",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.7133499,
        "_source" : {
          "name" : "山西呂梁-王五",
          "age" : "25",
          "address" : {
            "city" : "呂梁",
            "province" : "山西"
          }
        }
      }
    ]
  }
}

通配符查詢

使用 wildcard 查詢一個字符串中包含的字符,相當於SQL中的 like

GET student/_search
{
    "query": {
        "wildcard": {
            "name": {
                "value": "*王"
            }
        }
    }
}

查詢結果為:

{
    "took" : 0,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 1.0,
        "hits" : [
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "3",
                "_score" : 1.0,
                "_source" : {
                    "name" : "山西呂梁-王五",
                    "age" : "25",
                    "address" : {
                        "city" : "呂梁",
                        "province" : "山西"
                    }
                }
            }
        ]
    }
}

分頁及排序

查詢省份為山西的文檔,按照年齡倒序排列並分頁展示

GET student/_search
{
    "query": {
        "match": {
            "address.province": "山西"
        }
    },
    "sort": [
        {
            "age.keyword": {
                "order": "desc"
            }
        }
    ],
    "from": 2,
    "size": 2
}

from : 起始頁,下標從0開始。

size : 每頁顯示多少條

高亮查詢

使用 highlight 高亮查詢並且自定義高亮字段。並通過 pre_tagspost_tags 修改高亮文本前后綴。

GET student/_search
{
    "query": {
        "match": {
            "name": "張三"
        }
    },
    "highlight": {
        "pre_tags": "<br>", 
        "post_tags": "</br>", 
        "fields": {
            "name": {}
        }
    }
}

返回結果

{
    "took" : 0,
    "timed_out" : false,
    "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
    },
    "hits" : {
        "total" : {
            "value" : 1,
            "relation" : "eq"
        },
        "max_score" : 2.4079456,
        "hits" : [
            {
                "_index" : "student",
                "_type" : "_doc",
                "_id" : "1",
                "_score" : 2.4079456,
                "_source" : {
                    "name" : "山西太原-張三",
                    "age" : 23,
                    "address" : {
                        "city" : "太原",
                        "province" : "山西"
                    }
                },
                "highlight" : {
                    "name" : [
                        "山西太原-<br>張</br><br>三</br>"
                    ]
                }
            }
        ]
    }
}


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM