Metric聚合,主要針對數值類型的字段,類似於關系型數據庫中的sum、avg、max、min等聚合類型。
一、avg 平均值
對字段grade取平均值。對應的java示例如下:
@Resource private RestHighLevelClient client ; @Test public void testMatchQuery() { try { SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("items"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); AggregationBuilder avg = AggregationBuilders.avg("avg-price").field("price").missing(0); sourceBuilder.aggregation(avg); sourceBuilder.size(0); sourceBuilder.query( QueryBuilders.termQuery("category", "一級") ); searchRequest.source(sourceBuilder); SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(result); } catch (Throwable e) { e.printStackTrace(); } finally { try { client.close(); }catch (Exception e){ log.error(e.getMessage()); } } }
其中代碼missing(0)表示如果文檔中沒有取平均值的字段時,則使用該值進行計算,本例中使用0參與計算。
其返回結果如下:
{ "aggregations": { "asMap": { "avg-price": { "fragment": true, "name": "avg-price", "type": "avg", "value": 484.9945, "valueAsString": "484.9945" } }, "fragment": true }, "clusters": { "fragment": true, "skipped": 0, "successful": 0, "total": 0 }, "failedShards": 0, "fragment": false, "hits": { "fragment": true, "hits": [], "maxScore": 0, "totalHits": 2 }, "numReducePhases": 1, "profileResults": {}, "shardFailures": [], "skippedShards": 0, "successfulShards": 5, "timedOut": false, "took": { "days": 0, "daysFrac": 2.3148148148148148e-8, "hours": 0, "hoursFrac": 5.555555555555555e-7, "micros": 2000, "microsFrac": 2000, "millis": 2, "millisFrac": 2, "minutes": 0, "minutesFrac": 0.000033333333333333335, "nanos": 2000000, "seconds": 0, "secondsFrac": 0.002, "stringRep": "2ms" }, "totalShards": 5 }
二、Weighted Avg Aggregation 加權平均聚合
加權平均算法,∑(value * weight) / ∑(weight)。
加權平均(weghted_avg)支持的參數列表:
- value:提供值的字段或腳本的配置。例如定義計算哪個字段的平均值,該值支持如下子參數:
- field:用來定義平均值的字段名稱。
- missing:用來定義如果匹配到的文檔沒有avg字段,使用該值來參與計算。
- weight:用來定義權重的對象,其可選屬性如下:
- field:定義權重來源的字段。
- missing:如果文檔缺失權重來源字段,以該值來代表該文檔的權重值。
- format:數值類型格式化。
- value_type:用來指定value的類型,例如ValueType.DATE、ValueType.IP等。
從文檔中抽取屬性為weight的字段的值來當權重值。其JAVA示例如下:
@Test public void test_weight_avg_aggregation() { try { SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("items"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); WeightedAvgAggregationBuilder avg = AggregationBuilders.weightedAvg("avg-aggregation") .value( (new MultiValuesSourceFieldConfig.Builder()) .setFieldName("price") .build() ) .weight( (new MultiValuesSourceFieldConfig.Builder()) .setFieldName("price") .build() ); sourceBuilder.aggregation(avg); sourceBuilder.size(0); sourceBuilder.query( QueryBuilders.termQuery("category", "一級") ); searchRequest.source(sourceBuilder); SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(JSONObject.toJSONString(result)); } catch (Throwable e) { e.printStackTrace(); } finally { try { client.close(); }catch (Exception e){ log.error(e.getMessage()); } } }
三、Cardinality Aggregation
基數聚合,先distinct,再聚合,類似關系型數據庫(count(distinct))。
示例如下:
@Test public void test_Cardinality_Aggregation() { try { SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("poems"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); AggregationBuilder aggregationBuild = AggregationBuilders.cardinality("author_count").field("author"); sourceBuilder.aggregation(aggregationBuild); sourceBuilder.size(0); sourceBuilder.query( QueryBuilders.termQuery("dynasty", "唐") ); searchRequest.source(sourceBuilder); SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(JSONObject.toJSONString(result)); } catch (Throwable e) { e.printStackTrace(); } finally { try { client.close(); }catch (Exception e){ log.error(e.getMessage()); } } }
上述實現與SQL:SELECT COUNT(DISTINCT author) from es_order_tmp where dynasty = "唐"; 效果類似。
其核心參數如下:
- precision_threshold:精確度控制。在此計數之下,期望計數接近准確。在這個值之上,計數可能會變得更加模糊(不准確)。支持的最大值是40000,超過此值的閾值與40000的閾值具有相同的效果。默認值是3000。
上述示例中返回的11是精確值,如果改寫成下面的代碼,結果將變的不准確:
{ "aggregations": { "asMap": { "author_count": { "fragment": true, "name": "author_count", "type": "cardinality", "value": 6, "valueAsString": "6.0" } }, "fragment": true }, "clusters": { "fragment": true, "skipped": 0, "successful": 0, "total": 0 }, "failedShards": 0, "fragment": false, "hits": { "fragment": true, "hits": [], "maxScore": 0, "totalHits": 15 }, "numReducePhases": 1, "profileResults": {}, "shardFailures": [], "skippedShards": 0, "successfulShards": 5, "timedOut": false, "took": { "days": 0, "daysFrac": 4.2824074074074075e-7, "hours": 0, "hoursFrac": 0.000010277777777777777, "micros": 37000, "microsFrac": 37000, "millis": 37, "millisFrac": 37, "minutes": 0, "minutesFrac": 0.0006166666666666666, "nanos": 37000000, "seconds": 0, "secondsFrac": 0.037, "stringRep": "37ms" }, "totalShards": 5 }
其返回結果如下:
{ "aggregations": { "asMap": { "author_count": { "fragment": true, "name": "author_count", "type": "cardinality", "value": 12, "valueAsString": "12.0" } }, "fragment": true }, "clusters": { "fragment": true, "skipped": 0, "successful": 0, "total": 0 }, "failedShards": 0, "fragment": false, "hits": { "fragment": true, "hits": [], "maxScore": 0, "totalHits": 22 }, "numReducePhases": 1, "profileResults": {}, "shardFailures": [], "skippedShards": 0, "successfulShards": 5, "timedOut": false, "took": { "days": 0, "daysFrac": 2.5462962962962963e-7, "hours": 0, "hoursFrac": 0.000006111111111111111, "micros": 22000, "microsFrac": 22000, "millis": 22, "millisFrac": 22, "minutes": 0, "minutesFrac": 0.00036666666666666667, "nanos": 22000000, "seconds": 0, "secondsFrac": 0.022, "stringRep": "22ms" }, "totalShards": 5 }
- Pre-computed hashes:一個比較好的實踐是需要對字符串類型的字段進行基數聚合的話,可以提前索引該字符串的hash值,通過對hash值的聚合,提高效率。
- Missing Value:missing參數定義了應該如何處理缺少值的文檔。默認情況下,它們將被忽略,但也可以將它們視為具有一個值,通過missing value來設置。
四:Extended Stats Aggregation
stats聚合的擴展版本,示例如下:
@Test public void test_Extended_Stats_Aggregation() { try { SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("items"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); AggregationBuilder aggregationBuild = AggregationBuilders.extendedStats("extended_stats").field("price"); sourceBuilder.aggregation(aggregationBuild); sourceBuilder.size(0); // sourceBuilder.query( // QueryBuilders.termQuery("sellerId", 24) // ); searchRequest.source(sourceBuilder); SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(JSONObject.toJSONString(result)); } catch (Throwable e) { e.printStackTrace(); } finally { try { client.close(); }catch (Exception e){ log.error(e.getMessage()); } } }
返回的結果如下:
{ "aggregations": { "asMap": { "extended_stats": { "avg": 281.94725, "avgAsString": "281.94725", "count": 4, "fragment": true, "max": 880.999, "maxAsString": "880.999", "min": 10.9, "minAsString": "10.9", "name": "extended_stats", "stdDeviation": 349.2133556190077, "stdDeviationAsString": "349.2133556190077", "sum": 1127.789, "sumAsString": "1127.789", "sumOfSquares": 805776.8781010001, "sumOfSquaresAsString": "805776.8781010001", "type": "extended_stats", "variance": 121949.96774268753, "varianceAsString": "121949.96774268753" } }, "fragment": true }, "clusters": { "fragment": true, "skipped": 0, "successful": 0, "total": 0 }, "failedShards": 0, "fragment": false, "hits": { "fragment": true, "hits": [], "maxScore": 0, "totalHits": 4 }, "numReducePhases": 1, "profileResults": {}, "shardFailures": [], "skippedShards": 0, "successfulShards": 5, "timedOut": false, "took": { "days": 0, "daysFrac": 3.8194444444444445e-7, "hours": 0, "hoursFrac": 0.000009166666666666666, "micros": 33000, "microsFrac": 33000, "millis": 33, "millisFrac": 33, "minutes": 0, "minutesFrac": 0.00055, "nanos": 33000000, "seconds": 0, "secondsFrac": 0.033, "stringRep": "33ms" }, "totalShards": 5 }
五、max Aggregation
求最大值,與avg Aggregation聚合類似,不再重復介紹。
六、min Aggregation
求最小值,與avg Aggregation聚合類似,不再重復介紹。
七、Percentiles Aggregation
百分位計算,ES提供的另外一種近似度量方式。主要用於展現以具體百分比下觀察到的數值,例如,第95個百分位上的數值,是高於 95% 的數據總和。百分位聚合通常用來找出異常,適用與使用統計學中正態分布來觀察問題。
官方文檔:https://www.elastic.co/guide/cn/elasticsearch/guide/current/percentiles.html
八、HDR Histogram(直方圖)
HDR直方圖(High Dynamic Range Histogram,高動態范圍直方圖)是一種替代實現,在計算延遲度量的百分位數時非常有用,因為它比t-digest實現更快,但需要更大的內存占用。此實現維護一個固定的最壞情況百分比錯誤(指定為有效數字的數量)。這意味着如果數據記錄值從1微秒到1小時(3600000000毫秒)直方圖設置為3位有效數字,它將維持一個價值1微秒的分辨率值1毫秒,3.6秒(或更好的)最大跟蹤值(1小時)。
- hdr:通過hdr屬性指定直方圖相關的參數。
- number_of_significant_value_digits:指定以有效位數為單位的直方圖值的分辨率。
注意:hdr直方圖只支持正值,如果傳遞負值,則會出錯。如果值的范圍是未知的,那么使用HDRHistogram也不是一個好主意,因為這可能會導致內存的大量使用。
Missing value
- missing參數定義了應該如何處理缺少值的文檔。默認情況下,它們將被忽略,但也可以將它們視為具有一個值。
@Test public void test_Percentiles_Aggregation() { try { SearchRequest searchRequest = new SearchRequest(); searchRequest.indices("items"); SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); AggregationBuilder aggregationBuild = AggregationBuilders.percentiles("percentiles") .field("price") .percentiles(75,90,99.9) .compression(100) .method(PercentilesMethod.HDR) .numberOfSignificantValueDigits(3) ; sourceBuilder.aggregation(aggregationBuild); sourceBuilder.size(0); // sourceBuilder.query( // QueryBuilders.termQuery("sellerId", 24) // ); searchRequest.source(sourceBuilder); SearchResponse result = client.search(searchRequest, RequestOptions.DEFAULT); System.out.println(JSONObject.toJSONString(result)); } catch (Throwable e) { e.printStackTrace(); } finally { try { client.close(); }catch (Exception e){ log.error(e.getMessage()); } } }
參考博客:https://blog.csdn.net/prestigeding/article/details/88373092