Elasticsearch聚合之 Histogram 直方圖聚合

本文轉載自查看原文 2015-11-10 22:28 10830 histogram/ Elasticsearch/ elasticsearch/ aggregation/ ES

Elasticsearch支持最直方圖聚合，它在數字字段自動創建桶，並會掃描全部文檔，把文檔放入相應的桶中。這個數字字段既可以是文檔中的某個字段，也可以通過腳本創建得出的。

桶的篩選規則

舉個例子，有一個price字段，這個字段描述了商品的價格，現在想每隔5就創建一個桶，統計每隔區間都有多少個文檔（商品）。

如果有一個商品的價格為32，那么它會被放入30的桶中，計算的公式如下：

rem = value % interval
if (rem < 0) {
    rem += interval
}
bucket_key = value - rem

通過上面的方法，就可以確定文檔屬於哪一個桶。

不過也有一些問題存在，由於上面的方法是針對於整型數據的，因此如果字段是浮點數，那么需要先轉換成整型，再調用上面的方法計算。問題來了，正數還好，如果該值是負數，就會出現計算出錯。比如，一個字段的值為-4.5，在進行轉換整型時，轉換成了-4。那么按照上面的計算，它就會放入-4的桶中，但是其實-4.5應該放入-6的桶中。

min_doc_count過濾

聚合的dsl如下：

{
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "interval" : 50
            }
        }
    }
}

得到的數據為：

{
    "aggregations": {
        "prices" : {
            "buckets": [
                {
                    "key": 0,
                    "doc_count": 2
                },
                {
                    "key": 50,
                    "doc_count": 4
                },
                {
                    "key": 100,
                    "doc_count": 0
                },
                {
                    "key": 150,
                    "doc_count": 3
                }
            ]
        }
    }
}

上面的數據中，100-150是沒有文檔的，但是卻顯示為0.如果不想要顯示count為0的桶，可以通過min_doc_count來設置。

{
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "interval" : 50,
                "min_doc_count" : 1
            }
        }
    }
}

這樣返回的數據，就不會出現為0的了。

{
    "aggregations": {
        "prices" : {
            "buckets": [
                {
                    "key": 0,
                    "doc_count": 2
                },
                {
                    "key": 50,
                    "doc_count": 4
                },
                {
                    "key": 150,
                    "doc_count": 3
                }
            ]
        }
    }
}

extend_bounds,指定最小值和最大值邊界

默認情況下，ES中的histogram聚合起始都是自動的，比如price字段，如果沒有商品的價錢在0-5之間，0這個桶就不會顯示。如果最便宜的商品是11，那么第一個桶就是10.
可以通過設置extend_bounds強制規定最小值和最大值，但是要求必須min_doc_count不能大於0，不然即便是規定了邊界，也不會返回。

另外需要注意的是，如果規定的extend_bounds.min要大於文檔中的最小值，那么就會按照文檔中的最小值來（extend_bounds.max也是如此）。
比如下面的這個例子，規定的extend_bounds.min和max分別是40和50，但是文檔中含有比40還要小的數據，因此桶的定義仍然是按照文檔中的數據來。

order排序

排序大同小異，可以按照_key的名字排序：

{
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "interval" : 50,
                "order" : { "_key" : "desc" }
            }
        }
    }
}

也可以按照文檔的數目:

{
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "interval" : 50,
                "order" : { "_count" : "asc" }
            }
        }
    }
}

或者指定排序的聚合：

{
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "interval" : 50,
                "order" : { "price_stats.min" : "asc" } 
            },
            "aggs" : {
                "price_stats" : { "stats" : {} } 
            }
        }
    }
}

keyed設置返回的方式

正常返回的數據如上面所示，是按照數組的方式返回。如果要按照名字返回，可以設置keyed為true

{
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "interval" : 50,
                "keyed" : true
            }
        }
    }
}

那么返回的數據就為：

{
    "aggregations": {
        "prices": {
            "buckets": {
                "0": {
                    "key": 0,
                    "doc_count": 2
                },
                "50": {
                    "key": 50,
                    "doc_count": 4
                },
                "150": {
                    "key": 150,
                    "doc_count": 3
                }
            }
        }
    }
}

缺省的值

缺省值通過MissingValue設置：

{
    "aggs" : {
        "quantity" : {
             "histogram" : {
                 "field" : "quantity",
                 "interval": 10,
                 "missing": 0 
             }
         }
    }
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Elasticsearch聚合之 Date Histogram聚合 ES date_histogram 聚合 ElasticSearch聚合 Elasticsearch 聚合 Elasticsearch(9) --- 聚合查詢(Bucket聚合) Elasticsearch聚合之 Range區間聚合 Elasticsearch(8) --- 聚合查詢(Metric聚合) ES elasticsearch 各種聚合 elasticsearch筆記(8)聚合查詢 ElasticSearch 聚合分析