aggregation 詳解4（pipeline aggregations）

本文轉載自查看原文 2016-05-19 22:33 5327 ElasticSearch

概述

管道聚合處理的對象是其它聚合的輸出（桶或者桶的某些權值），而不是直接針對文檔。

管道聚合的作用是為輸出增加一些有用信息。

管道聚合大致分為兩類：

parent

此類聚合的"輸入"是其【父聚合】的輸出，並對其進行進一步處理。一般不生成新的桶，而是對父聚合桶信息的增強。

sibling

此類聚合的輸入是其【兄弟聚合】的輸出。並能在同級上計算新的聚合。

管道聚合通過 buckets_path 參數指定他們要進行聚合計算的權值對象，buckets_path 參數有其自己的使用語法。

管道聚合不能包含子聚合，但是某些類型的管道聚合可以鏈式使用（比如計算導數的導數）。

bucket_path語法

1. 聚合分隔符 ==> ">"，指定父子聚合關系，如："my_bucket>my_stats.avg"

2. 權值分隔符 ==> "."，指定聚合的特定權值

3. 聚合名稱 ==> <name of the aggregation> ，直接指定聚合的名稱

4. 權值 ==> <name of the metric> ，直接指定權值

5. 完整路徑 ==> agg_name[> agg_name]*[. metrics] ，綜合利用上面的方式指定完整路徑

6. 特殊值 ==> "_count"，輸入的文檔個數

特殊情況

1. 要進行 pipeline aggregation 聚合的對象名稱或權值名稱包含小數點

"buckets_path": "my_percentile[99.9]"

2. 處理對象中包含空桶（無文檔的桶分）

參數 gap_policy，可選值有 skip、insert_zeros

Avg Bucket Aggregation(sibliing)

桶均值聚合——基於兄弟聚合的某個權值，求所有桶的權值均值。

用於計算的兄弟聚合必須是多桶聚合。

用於計算的權值必須是數值類型。

配置參數

buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）
format：該聚合的輸出格式定義

{
    "aggs" : {
        "sales_per_month" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "price"
                    }
                }
            }
        },
        "avg_monthly_sales": {
            "avg_bucket": {             //對所有月份的銷售總 sales 求平均值
                "buckets_path": "sales_per_month>sales" 
            }
        }
    }
}

Derivative Aggregation(parent)

求導聚合——基於父聚合（只能是histogram或date_histogram類型）的某個權值，對權值求導。

用於求導的權值必須是數值類型。

封閉直方圖（histogram）聚合的 min_doc_count 必須是 0。

配置參數

buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）
format：該聚合的輸出格式定義

{
    "aggs" : {
        "sales_per_month" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "price"
                    }
                },
                "sales_deriv": {       //對每個月銷售總和 sales 求導
                    "derivative": {
                        "buckets_path": "sales"  //同級，直接用 metric 值
                    }
                }
            }
        }
    }
}

Max Bucket Aggregation(sibling)

桶最大值聚合——基於兄弟聚合的某個權值，輸出權值最大的那一個桶。

用於計算的權值必須是數值類型。

用於計算的兄弟聚合必須是多桶聚合類型。

配置參數

buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）
format：該聚合的輸出格式定義

Min Bucket Aggregation(sibling)

桶最小值聚合——基於兄弟聚合的某個權值，輸出權值最小的一個桶。

用於計算的權值必須是數值類型。

用於計算的兄弟聚合必須是多桶聚合類型。

配置參數

buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）
format：該聚合的輸出格式定義

Sum Buchet Aggregation(sibling)

桶求和聚合——基於兄弟聚合的權值，對所有桶的權值求和。

用於計算的權值必須是數值類型。

用於計算的兄弟聚合必須是多桶聚合類型。

配置參數

buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）
format：該聚合的輸出格式定義

{
    "aggs" : {
        "sales_per_month" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "price"
                    }
                }
            }
        },
        "max_monthly_sales": {        //輸出兄弟聚合 sales_per_month 的每月銷售總和 sales 的最大一個桶
            "max_bucket": {
                "buckets_path": "sales_per_month>sales" 
            }
        },
        "min_monthly_sales": {         //輸出兄弟聚合 sales_per_month 的每月銷售總和 sales 的最小一個桶
            "min_bucket": {
                "buckets_path": "sales_per_month>sales" 
            }
        },
        "sum_monthly_sales": {         //輸出兄弟聚合 sales_per_month 的每月銷售總和 sales 的最小一個桶
            "sum_bucket": {
                "buckets_path": "sales_per_month>sales" 
            }
        }
    }
}

Stats Bucket Aggregation(sibling)

桶統計信息聚合——基於兄弟聚合的某個權值，對【桶的信息】進行一些統計學運算（總計多少個桶、所有桶中該權值的最大值、最小等）。

用於計算的權值必須是數值類型。

用於計算的兄弟聚合必須是多桶聚合類型。

配置參數

buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）
format：該聚合的輸出格式定義

{
    "aggs" : {
        "sales_per_month" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "price"
                    }
                }
            }
        },
        "stats_monthly_sales": {               // 對父聚合的每個桶（每月銷售總和）的一些基本信息進行聚合
            "stats_bucket": {
                "buckets_paths": "sales_per_month>sales" 
            }
        }
    }
}
//輸出結果
{
   "aggregations": {
      "sales_per_month": {
         "buckets": [
            {
               "key_as_string": "2015/01/01 00:00:00",
               "key": 1420070400000,
               "doc_count": 3,
               "sales": {
                  "value": 550
               }
            },
            {
               "key_as_string": "2015/02/01 00:00:00",
               "key": 1422748800000,
               "doc_count": 2,
               "sales": {
                  "value": 60
               }
            },
            {
               "key_as_string": "2015/03/01 00:00:00",
               "key": 1425168000000,
               "doc_count": 2,
               "sales": {
                  "value": 375
               }
            }
         ]
      },
      "stats_monthly_sales": {        //注意，統計的是桶的信息
         "count": 3,
         "min": 60,
         "max": 550,
         "avg": 328.333333333,
         "sum": 985
      }
   }
}

Extended Stats Bucket Aggregation(sibling)

擴展桶統計聚合——基於兄弟聚合的某個權值，對【桶信息】進行一系列統計學計算（比普通的統計聚合多了一些統計值）。

用於計算的權值必須是數值類型。

用於計算的兄弟聚合必須是多桶聚合類型。

配置參數

buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）
format：該聚合的輸出格式定義
sigma：偏差顯示位置（above/below）

Percentiles Bucket Aggregation(sibling)

桶百分比聚合——基於兄弟聚合的某個權值，計算權值的百分百。

用於計算的權值必須是數值類型。

用於計算的兄弟聚合必須是多桶聚合類型。

對百分百的計算是精確的（不像Percentiles Metric聚合是近似值），所以可能會消耗大量內存

配置參數

buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）
format：該聚合的輸出格式定義
percents：需要計算的百分百列表（數組形式）

Moving Average Aggregation(parent)

窗口平均值聚合——基於已經排序過的數據，計算出處在當前出口中數據的平均值。

比如窗口大小為 5 ，對數據 1—10 的部分窗口平均值如下：

(1 + 2 + 3 + 4 + 5) / 5 = 3
(2 + 3 + 4 + 5 + 6) / 5 = 4
(3 + 4 + 5 + 6 + 7) / 5 = 5

配置參數

buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）
window：窗口大小
model：移動模型
minimize：
settings：

{
    "the_movavg":{
        "moving_avg":{
            "buckets_path": "the_sum",
            "window" : 30,
            "model" : "simple"
        }
    }
}

Cumulative Sum Aggregation(parent)

累計和聚合——基於父聚合（只能是histogram或date_histogram類型）的某個權值，對權值在每一個桶中求所有之前的桶的該值累計的和。

用於計算的權值必須是數值類型。

封閉直方圖（histogram）聚合的 min_doc_count 必須是 0。

配置參數

buckets_path：用於計算均值的權值路徑
format：該聚合的輸出格式定義

{
    "aggs" : {
        "sales_per_month" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month"
            },
            "aggs": {
                "sales": {
                    "sum": {
                        "field": "price"
                    }
                },
                "cumulative_sales": {
                    "cumulative_sum": {
                        "buckets_path": "sales" 
                    }
                }
            }
        }
    }
}
//輸出
{
   "aggregations": {
      "sales_per_month": {
         "buckets": [
            {
               "key_as_string": "2015/01/01 00:00:00",
               "key": 1420070400000,
               "doc_count": 3,
               "sales": {
                  "value": 550
               },
               "cumulative_sales": {
                  "value": 550                //總計 sales = 550
               }
            },
            {
               "key_as_string": "2015/02/01 00:00:00",
               "key": 1422748800000,
               "doc_count": 2,
               "sales": {
                  "value": 60
               },
               "cumulative_sales": {
                  "value": 610               //總計 sales = 550 + 60
               }
            },

Bucket Script Aggregation(parent)

桶腳本聚合——基於父聚合的【一個或多個權值】，對這些權值通過腳本進行運算。

用於計算的父聚合必須是多桶聚合。

用於計算的權值必須是數值類型。

執行腳本必須要返回數值型結果。

配置參數

script：用於計算的腳本，腳本可以是 inline，也可以是 file，還可以是 Scripting 指定的
buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）
format：該聚合的輸出格式定義

{
    "aggs" : {
        "sales_per_month" : {
            "date_histogram" : {
                "field" : "date",
                "interval" : "month"
            },
            "aggs": {
                "total_sales": {
                    "sum": {
                        "field": "price"
                    }
                },
                "t-shirts": {
                  "filter": {
                    "term": {
                      "type": "t-shirt"
                    }
                  },
                  "aggs": {
                    "sales": {
                      "sum": {
                        "field": "price"
                      }
                    }
                  }
                },
                "t-shirt-percentage": {
                    "bucket_script": {
                        "buckets_path": {                    //對兩個權值進行計算
                          "tShirtSales": "t-shirts>sales",
                          "totalSales": "total_sales"
                        },
                        "script": "tShirtSales / totalSales * 100"
                    }
                }
            }
        }
    }
}

Bucket Selector Aggregation(parent)

桶選擇器聚合——基於父聚合的【一個或多個權值】，通過腳本對權值進行計算，並決定父聚合的哪些桶需要保留，其余的將被丟棄。

用於計算的父聚合必須是多桶聚合。

用於計算的權值必須是數值類型。

運算的腳本必須是返回 boolean 類型，如果腳本是腳本表達式形式給出，那么允許返回數值類型。

配置參數

script：用於計算的腳本，腳本可以是 inline，也可以是 file，還可以是 Scripting 指定的
buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）

{
    "bucket_selector": {
        "buckets_path": {
            "my_var1": "the_sum", 
            "my_var2": "the_value_count"
        },
        "script": "my_var1 > my_var2"    // true 則保留該桶；false 則丟棄
    }
}

Serial Differencing Aggregation(parent)

串行差分聚合——基於父聚合（只能是histogram或date_histogram類型）的某個權值，對權值值進行差分運算，（取時間間隔，后一刻的值減去前一刻的值：f(X) = f(Xt) – f(Xt-n)）。

用於計算的父聚合必須是多桶聚合。

配置參數

lag：滯后間隔（比如lag=7，表示每次從當前桶的值中減去其前面第7個桶的值）
buckets_path：用於計算均值的權值路徑
gap_policy：空桶處理策略（skip/insert_zeros）
format：該聚合的輸出格式定義

{
   "aggs": {
      "my_date_histo": {                  
         "date_histogram": {
            "field": "timestamp",
            "interval": "day"
         },
         "aggs": {
            "the_sum": {
               "sum": {
                  "field": "lemmings"     
               }
            },
            "thirtieth_difference": {
               "serial_diff": {                
                  "buckets_path": "the_sum",
                  "lag" : 30                        //差分間隔為 30 day
               }
            }
         }
      }
   }
}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 aggregation 詳解2（metrics aggregations） aggregation 詳解3（bucket aggregation） ES Pipeline Aggregation(管道聚合) MongoDB 聚合管道（Aggregation Pipeline） Grouping sets aggregations (with rollups or cubes) are not allowed if aggregation function parameters overlap with the aggregation functions columns hive報錯FAILED: SemanticException [Error 10210]: Grouping sets aggregations (with rollups or cubes) are not allowed if aggregation function parameters overlap with the aggregation functions columns Pipeline詳解鏈路聚合詳解——Link Aggregation MongoDB學習筆記——聚合操作之聚合管道（Aggregation Pipeline） Jenkins pipeline：pipeline 語法詳解