ES提供了強大的聚合分析功能,按照操作上細化,可以主要分為四種,如下表所示:
聚合方式 | 解釋 |
---|---|
Bucket Aggregation | 一些滿足特定條件的文檔的集合 |
Metric Aggregation | 一些數學計算,可以對文檔字段統計分析 |
Pipeline Aggregation | 對其他的聚合結果進行二次聚合 |
Metrix Aggregation | 支持對多個字段的操作並提供一個結果矩陣 |
在我個人看來這些只是理論意義上的細化,在實際的應用過程中,我們並沒有說針對那種場景使用那種聚合分析。都是為了滿足我們的業務,在實現的過程中同時會使用到多種聚合的方式。
一. 四種聚合方式
1.1 Bucket(分桶)
分桶就是將具有某一類共同特征的數據歸為一類,然后求其總數,例如: 男女、公司同一工作崗位的員工、商品高中低檔等。在對數據分桶后還可以進一步分桶,例如:0~ 20歲男性、21~50歲男性、50歲以上男性;同一工作崗位男性、女性;高檔商品好評、中評、差評的商品。

1.2 Metric(計算)
計算具有一類特征的數據的統計值,例如平均值、最大值、最小值等。
1.3 Pipeline(管道)
pipeline與Linux操作系統中的管道操作(將上一步操作的結果作為下一步操作的數據源)類似。即將上一次聚合操作的結果作為下一次聚合操作的數據源。
1.4 Metrix(矩陣)
矩陣就是同時可以支持多值的輸出,例如對分桶的數據同時求平均、最大、最小值;
二. 具體的案例
在說具體的案例的時候筆者並不會嚴格的去按照四種聚合方式去講解。首先在ES中插入一批的測試數據,在插入測試數據之前先定義mapping.
2.1 mapping的定義
PUT employee { "mappings": { "properties": { "id": { "type": "integer" }, "name": { "type": "keyword" }, "job": { "type": "keyword" }, "age": { "type": "integer" }, "gender": { "type": "keyword" } } } }
2.2 插入數據
PUT employee/_bulk {"index": {"_id": 1}} {"id": 1, "name": "Bob", "job": "java", "age": 21, "sal": 8000, "gender": "male"} {"index": {"_id": 2}} {"id": 2, "name": "Rod", "job": "html", "age": 31, "sal": 18000, "gender": "female"} {"index": {"_id": 3}} {"id": 3, "name": "Gaving", "job": "java", "age": 24, "sal": 12000, "gender": "male"} {"index": {"_id": 4}} {"id": 4, "name": "King", "job": "dba", "age": 26, "sal": 15000, "gender": "female"} {"index": {"_id": 5}} {"id": 5, "name": "Jonhson", "job": "dba", "age": 29, "sal": 16000, "gender": "male"} {"index": {"_id": 6}} {"id": 6, "name": "Douge", "job": "java", "age": 41, "sal": 20000, "gender": "female"} {"index": {"_id": 7}} {"id": 7, "name": "cutting", "job": "dba", "age": 27, "sal": 7000, "gender": "male"} {"index": {"_id": 8}} {"id": 8, "name": "Bona", "job": "html", "age": 22, "sal": 14000, "gender": "female"} {"index": {"_id": 9}} {"id": 9, "name": "Shyon", "job": "dba", "age": 20, "sal": 19000, "gender": "female"} {"index": {"_id": 10}} {"id": 10, "name": "James", "job": "html", "age": 18, "sal": 22000, "gender": "male"} {"index": {"_id": 11}} {"id": 11, "name": "Golsling", "job": "java", "age": 32, "sal": 23000, "gender": "female"} {"index": {"_id": 12}} {"id": 12, "name": "Lily", "job": "java", "age": 24, "sal": 2000, "gender": "male"} {"index": {"_id": 13}} {"id": 13, "name": "Jack", "job": "html", "age": 23, "sal": 3000, "gender": "female"} {"index": {"_id": 14}} {"id": 14, "name": "Rose", "job": "java", "age": 36, "sal": 6000, "gender": "female"} {"index": {"_id": 15}} {"id": 15, "name": "Will", "job": "dba", "age": 38, "sal": 4500, "gender": "male"} {"index": {"_id": 16}} {"id": 16, "name": "smith", "job": "java", "age": 32, "sal": 23000, "gender": "male"}
數據說明:插入的數據為員工信息,name是員工的姓名,job是員工的工種,age為員工的年齡,sal為員工的薪水,gender為員工的性別
2.3 聚合查詢
查詢工種的數量
GET employee/_search { "size": 0, "aggs": { "job_category_count": { "terms": { "field": "job" } } } }

查詢每個工種的分桶信息
GET employee/_search { "size": 0, "aggs": { "job_category_num": { "cardinality": { "field": "job" } } } }

查詢不同工種的員工的數量,並查詢每個工種最大年齡的員工信息。
GET employee/_search { "size": 0, "aggs": { "job_analysis": { "terms": { "field": "job" }, "aggs": { "age_top_1": { "top_hits": { "size": 1, "sort": [ { "age": { "order": "desc" } } ] } } } } } }

查詢工資范圍在 0~5000, 5001~8000, 8001~12000, 12001~18000, 18001+ 員工的人數
GET employee/_search { "size": 0, "aggs": { "sal_range_info": { "range": { "field": "sal", "ranges": [ { "to": 5000 }, { "from": 5001, "to": 8000 }, { "from": 8001, "to": 12000 }, { "from": 12001, "to": 18000 }, { "from": 18001 } ] } } } }

以每5000為一個區間,查詢工資在對應范圍內的員工的數量
GET employee/_search { "size": 0, "aggs": { "sal_histogram": { "histogram": { "field": "sal", "interval": 5000, "extended_bounds": { "min": 0, "max": 25000 } } } } }

查詢每個工種的數量,以及不同工種的工資統計信息
GET employee/_search { "size": 0, "aggs": { "job_and_salary_info": { "terms": { "field": "job" }, "aggs": { "sal_info": { "stats": { "field": "sal" } } } } } }

不同工種下男女員工的數量,以及男女員工的薪資信息
GET employee/_search { "size": 0, "aggs": { "job_gender_sal_info": { "terms": { "field": "job" }, "aggs": { "gender_info": { "terms": { "field": "gender" }, "aggs": { "sal_info": { "stats": { "field": "sal" } } } } } } } }

查詢平均工資最低的部門的平均工資,以及最低工資。
GET employee/_search { "size": 0, "aggs": { "jobs": { "terms": { "field": "job" }, "aggs": { "sal_info": { "avg": { "field": "sal" } } } }, "min_avg_sal": { "max_bucket": { "buckets_path": "jobs>sal_info" } } } }

三. ES自帶航空數據案例
查詢到達各目的地的航班的數量
GET kibana_sample_data_flights/_search { "size": 0, "aggs": { "dest_info": { "terms": { "field": "DestCountry" } } } }

查詢到達各航班的的數量,以及票價的最大值,平均值。
GET kibana_sample_data_flights/_search { "size": 0, "aggs": { "dest_info": { "terms": { "field": "DestCountry" }, "aggs": { "max_ticket_price": { "max": { "field": "AvgTicketPrice" } }, "avg_ticket_price": { "avg": { "field": "AvgTicketPrice" } } } } } }

查詢到達各航班的的數量,以及票價的聚合信息以及天氣的基本信息。
GET kibana_sample_data_flights/_search { "size": 0, "aggs": { "dest_info": { "terms": { "field": "DestCountry" }, "aggs": { "ticket_info": { "stats": { "field": "AvgTicketPrice" } }, "weather_info": { "terms": { "field": "DestWeather" } } } } } }

喜歡這篇文章?歡迎打賞~~