Elasticsearch：top_hits aggregation

本文轉載自查看原文 2019-12-24 17:07 4838 ELK Stack

top_hits指標聚合器跟蹤要聚合的最相關文檔。該聚合器旨在用作子聚合器，以便可以按存儲分區匯總最匹配的文檔。

top_hits聚合器可以有效地用於通過存儲桶聚合器按某些字段對結果集進行分組。一個或多個存儲桶聚合器確定將結果集切成哪些屬性。

選項：

from-要獲取的第一個結果的偏移量。
size-每個存儲桶要返回的最匹配匹配項的最大數目。默認情況下，返回前三個匹配項。
排序-匹配的熱門匹配的排序方式。默認情況下，命中按主要查詢的分數排序。

我們還是來用一個例子來展示如何使用這個：

准備數據：

我們選用Kibana里帶的官方的Sample web logs來作為我們的索引：

然后加載我們的索引：

這樣我們的數據就加載完成了。

Top hits aggregation

首先，我們先做一個簡單的基於hosts的aggregation:

GET kibana_sample_data_logs/_search
{
  "size": 0,
  "aggs": {
    "hosts": {
      "terms": {
        "field": "host.keyword",
        "size": 2
      }
    }
  }
}

上面的搜索的結果是我們想得到2個桶的數據（這里為了說明問題的方便，設定為2）。而這兩個桶是基於hosts的值。搜索的結果是：

"aggregations" : {
    "hosts" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 2807,
      "buckets" : [
        {
          "key" : "artifacts.elastic.co",
          "doc_count" : 6488
        },
        {
          "key" : "www.elastic.co",
          "doc_count" : 4779
        }
      ]
    }
  }

現在的要求是：我們想針對這里的每個桶得到按照我們需要排序的前面的幾個結果，比如下面的搜索：

GET kibana_sample_data_logs/_search
{
  "size": 0,
  "aggs": {
    "hosts": {
      "terms": {
        "field": "host.keyword",
        "size": 2
      },
      "aggs": {
        "most_bytes": {
          "top_hits": {
            "sort": [
              {
                "bytes": {
                  "order": "desc"
                }
              }
            ],
            "_source": {
              "includes": [
                "bytes",
                "hosts",
                "ip",
                "clientip"
              ]
            },
            "size": 2
          }
        }
      }
    }
  }
}

上面實際上市一個pipleline的聚合。它在針對上面的桶來做了一個top_hits的聚合。針對每個桶，我們需要安裝bytes的大小，降序排列，並且每個桶只需要兩個數據：

  "aggregations" : {
    "hosts" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 2807,
      "buckets" : [
        {
          "key" : "artifacts.elastic.co",
          "doc_count" : 6488,
          "most_bytes" : {
            "hits" : {
              "total" : {
                "value" : 6488,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "kibana_sample_data_logs",
                  "_type" : "_doc",
                  "_id" : "dnNIHm8BjrINWI3xXlRc",
                  "_score" : null,
                  "_source" : {
                    "bytes" : 19929,
                    "ip" : "127.155.255.9",
                    "clientip" : "127.155.255.9"
                  },
                  "sort" : [
                    19929
                  ]
                },
                {
                  "_index" : "kibana_sample_data_logs",
                  "_type" : "_doc",
                  "_id" : "OXNIHm8BjrINWI3xX1td",
                  "_score" : null,
                  "_source" : {
                    "bytes" : 19904,
                    "ip" : "100.177.58.231",
                    "clientip" : "100.177.58.231"
                  },
                  "sort" : [
                    19904
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "www.elastic.co",
          "doc_count" : 4779,
          "most_bytes" : {
            "hits" : {
              "total" : {
                "value" : 4779,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "kibana_sample_data_logs",
                  "_type" : "_doc",
                  "_id" : "4nNIHm8BjrINWI3xYWQl",
                  "_score" : null,
                  "_source" : {
                    "bytes" : 19986,
                    "ip" : "233.204.30.48",
                    "clientip" : "233.204.30.48"
                  },
                  "sort" : [
                    19986
                  ]
                },
                {
                  "_index" : "kibana_sample_data_logs",
                  "_type" : "_doc",
                  "_id" : "wnNIHm8BjrINWI3xW0Rj",
                  "_score" : null,
                  "_source" : {
                    "bytes" : 19956,
                    "ip" : "129.237.102.30",
                    "clientip" : "129.237.102.30"
                  },
                  "sort" : [
                    19956
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }

從上面的返回結果可以看出來兩個hosts artifacts.elastic.co及www.elastic.co各返回兩個結果，並且它們是按照bytes的大小進行降序排列的。

細心的讀者可能會發現這個和我之前介紹的field collapsing有些類似。只是field collapsing里針對每個桶有一個結果，並且是按照我們的要求進行排序的最高結果的那個。當然我們也可以含有多幾個返回結果在inner_hits之中。

參考：
【1】https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 讀《深入理解Elasticsearch》點滴-聚合-top_hits elasticsearch 基礎 —— Inner hits ElasticSearch查詢之Aggregation Elasticsearch聚合——aggregation Elasticsearch索引聚合Aggregation Elasticsearch：significant terms aggregation elasticsearch之警惕inner hits的性能問題 Elasticsearch--Aggregation詳細總結（聚合統計） ElasticSearch 的Bucket Aggregation 桶聚合(包含javaApi) ElasticSearch Cardinality Aggregation聚合計算的誤差