背景:近期因為項目需求,需要用到druid的分位數聚合查詢,關於分位數的概念這里不做詳述,自行百度即可。因為沒有使用過且網上也說直方圖特性還在druid的實驗特性里面,應該是還不太完善,因此個人嘗試研究使用了一下,看看結果是個啥樣的。
應用場景:approximate-histograms配合使用quantile(quantiles)等分位數post-agg可以實現查詢0.95/0.98/0.99等的頁面加載時間。
因為我們是Druid的業務使用方,所以服務並非我們這邊管理,根據官網的說明,首先需要添加插件,公司使用的Druid服務版本是0.9.2版本的,截止目前官網最新是0.12.3了。所以需要:
添加擴展支持
添加方式:查看{DRUID}/extensions目錄下druid-histogram存在。
druid-histogram需要添加到extension:
druid.extensions.loadList=["druid-histogram",.....]
節點需要重啟來加載新添加的extension:
- 查詢端,需要重啟historical節點和broker節點。
- 數據攝入端,需要重啟overlord節
服務插件得到支持后,然后數據攝入:
根據官網直方圖的介紹:http://druid.io/docs/latest/development/extensions-core/approximate-histograms.html
數據攝入
{ ...... "metricsSpec": [ ...... { "name": "pageLoad", "type": "longSum", "fieldName": "pageLoad" }, { "type" : "approxHistogramFold", "name" : "his_pageLoad", "fieldName" : "pageLoad", "resolution" : 50, "numBuckets" : 7, "lowerLimit" : 0.0, "upperLimit" : 10000000.0 } ...... ] ...... }
這里我只是用了pageLoad這字段來實驗而已,看看druid對pageLoad進行sum和分位數計算的對比。
需要注意的是resolution、numBuckets、lowerLimit、upperLimit這幾個參數的含義參見官網解釋,這里不做過多說明,這里我的設置是完全看心情寫的。接着就是查詢了:
查詢腳本
腳本1、求sum:
{ "queryType":"timeseries", "dataSource":{ "type":"table", "name":"bpm_page_view" }, "context":{ "priority":7, "timeout":3000, "queryId":"f7d75164-2d53-44fe-8978-10742e102c3d" }, "intervals":{ "type":"LegacySegmentSpec", "intervals":[ "2018-11-26T15:26:10.773+08:00/2018-11-26T15:56:10.773+08:00" ] }, "descending":false, "filter":{ "type":"and", "fields":[ { "type":"selector", "dimension":"appCode", "value":"ec269367bf854639a56cb1618a097c38", "extractionFn":null } ] }, "granularity":{ "type":"duration", "duration":60000, "origin":"1970-01-01T08:00:00.000+08:00" }, "aggregations":[ { "type":"filtered", "aggregator":{ "type":"longSum", "name":"pageLoad", "fieldName":"pageLoad" }, "filter":{ "type":"and", "fields":[ { "type":"selector", "dimension":"terminal", "value":"IOS", "extractionFn":null } ] } } ], "postAggregations":null }
腳本2、求分位數(這里求90%,95%,99%)
{ "queryType":"timeseries", "dataSource":{ "type":"table", "name":"bpm_page_view" }, "context":{ "priority":7, "timeout":3000, "queryId":"f7d75164-2d53-44fe-8978-10742e102c3d" }, "intervals":{ "type":"LegacySegmentSpec", "intervals":[ "2018-11-26T15:26:10.773+08:00/2018-11-26T15:56:10.773+08:00" ] }, "descending":false, "filter":{ "type":"and", "fields":[ { "type":"selector", "dimension":"appCode", "value":"ec269367bf854639a56cb1618a097c38", "extractionFn":null } ] }, "granularity":{ "type":"duration", "duration":60000, "origin":"1970-01-01T08:00:00.000+08:00" }, "aggregations":[ { "type":"filtered", "aggregator":{ "type": "approxHistogramFold", "name": "his_pageLoad", "fieldName": "his_pageLoad", "resolution" : null, "numBuckets" : null }, "filter":{ "type":"and", "fields":[ { "type":"selector", "dimension":"terminal", "value":"IOS", "extractionFn":null } ] } } ], "postAggregations":[ { "type" : "quantiles", "name" : "響應時間", "fieldName" : "his_pageLoad","probabilities" : [0.9,0.95,0.99] } ] }
聚合結果分析
向kafka發送數據,druid來處理,kafka生產者api:
package com.suning.ctbpm; import kafka.javaapi.producer.Producer; import kafka.producer.KeyedMessage; import kafka.producer.ProducerConfig; import java.util.Properties; public class KafkaProducerSimple { public static void main(String[] args) { String topic = "xxxx"; Properties props = new Properties(); props.put("serializer.class", "kafka.serializer.StringEncoder"); props.put("metadata.broker.list", "xxxxxxx"); props.put("request.required.acks", "1"); Producer<String, String> producer = new Producer<>(new ProducerConfig(props)); String msg; for (int i = 1; i <= 200; i++) { int j = i; //if (i == 10) { // j = 11; //} msg = "{\n" + " \"access\":\"IE_10_0\",\n" + " \"apdexSign\":100,\n" + " \"appCode\":\"ec269367bf854639a56cb1618a097c38\",\n" + " \"area\":\"某某區\",\n" + " \"blankScreen\":11,\n" + " \"browser\":\"IE\",\n" + " \"browserVersion\":\"IE_10\",\n" + " \"cache\":30,\n" + " \"city\":\"某某城市\",\n" + " \"country\":\"zh_CN\",\n" + " \"dns\":11,\n" + " \"domParser\":211,\n" + " \"domain\":\"xxx.xxx.com\",\n" + " \"firstAction\":110,\n" + " \"firstPacket\":44,\n" + " \"firstPaint\":20,\n" + " \"htmlLoad\":187,\n" + " \"ip\":\"10.200.181.61\",\n" + " \"keyPageCode\":[\n" + "\n" + " ],\n" + " \"logTime\":1543221571000,\n" + " \"net\":116,\n" + " \"operator\":\"unknown\",\n" + " \"os\":\"iOS 10 (iPhone)\",\n" + " \"pageLoad\":" + j + ",\n" + " \"pageRef\":\"http://xxx.xxx.com/broadcast/matchBefore.html\",\n" + " \"pageRender\":769,\n" + " \"processing\":765,\n" + " \"province\":\"謀省\",\n" + " \"redirect\":10,\n" + " \"request\":44,\n" + " \"resourceLoad\":558,\n" + " \"response\":101,\n" + " \"restPacket\":101,\n" + " \"slowPageSign\":10,\n" + " \"ssl\":10,\n" + " \"stalled\":10,\n" + " \"tcp\":42,\n" + " \"terminal\":\"IOS\",\n" + " \"ua\":\"Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Mobile/14E304;PPTVSports\",\n" + " \"unload\":10,\n" + " \"version\":\"V1.0.7\",\n" + " \"visitId\":\"f7f2-7f8c760d\"\n" + "}"; KeyedMessage<String, String> record = new KeyedMessage<>(topic, msg); producer.send(record); } producer.close(); } }
這里msg是根據自己的攝入腳本業務數據來造的,注意時間logTime字段,因為我們方便德魯伊聚合到一個點上來觀察,因為每一次聚合我們讓logTime的時間一樣,且查詢腳本里面的intervals查詢時間段應該是包括這個logTime。
第一次:讓msg 中的logTime=1543233754000,讓for循環10次,即:pageLoad從1到10的十條數據到kafka。
執行腳本1("intervals":["2018-11-26T19:34:03.205+08:00/2018-11-26T20:04:03.205+08:00"]):
[ { "timestamp": "2018-11-26T11:34:00.000Z", "result": { "pageLoad": 0 } }, ...... { "timestamp": "2018-11-26T12:01:00.000Z", "result": { "pageLoad": 0 } }, { "timestamp": "2018-11-26T12:02:00.000Z", "result": { "pageLoad": 55 } }, { "timestamp": "2018-11-26T12:03:00.000Z", "result": { "pageLoad": 0 } }, { "timestamp": "2018-11-26T12:04:00.000Z", "result": { "pageLoad": 0 } } ]
執行腳本2("intervals":["2018-11-26T19:34:03.205+08:00/2018-11-26T20:04:03.205+08:00"]):
[ { "timestamp": "2018-11-26T11:34:00.000Z", "result": { "his_pageLoad": { "breaks": [ "Infinity", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "-Infinity" ], "counts": [ "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN" ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ "NaN", "NaN", "NaN" ], "min": "Infinity", "max": "-Infinity" } } }, ...... { "timestamp": "2018-11-26T12:02:00.000Z", "result": { "his_pageLoad": { "breaks": [ -0.5, 1, 2.5, 4, 5.5, 7, 8.5, 10 ], "counts": [ 1, 1, 2, 1, 2, 1, 2 ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 9, 9.5, 9.9 ], "min": 1, "max": 10 } } }, ...... { "timestamp": "2018-11-26T12:04:00.000Z", "result": { "his_pageLoad": { "breaks": [ "Infinity", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "-Infinity" ], "counts": [ "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN" ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ "NaN", "NaN", "NaN" ], "min": "Infinity", "max": "-Infinity" } } } ]
比較兩次:druid將我的十條數據聚合在點的:2018-11-26T12:02:00.000Z,數據的logTime是:2018-11-26 20:02:34,十條數據1+2+。。。+10=55,且:TP90是9,TP95是9.5,TP99是9.9
第二次:讓msg 中的logTime=1543234941000,讓for循環10次,即:pageLoad從1到10的十條數據到kafka,且當i=9的時候我將j設置為7即:(1、2、3、4、5、6、7、8、7、10),為了驗證是排序后的。
執行腳本1("intervals":["2018-11-26T19:53:12.89+08:00/2018-11-26T20:23:12.891+08:00"]):
[ { "timestamp": "2018-11-26T11:53:00.000Z", "result": { "pageLoad": 0 } },
......
{
"timestamp": "2018-11-26T12:02:00.000Z",
"result": {
"pageLoad": 55
}
},
...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "pageLoad": 53 } }, { "timestamp": "2018-11-26T12:23:00.000Z", "result": { "pageLoad": 0 } } ]
執行腳本2("intervals":["2018-11-26T19:53:12.89+08:00/2018-11-26T20:23:12.891+08:00"]):
[ { "timestamp": "2018-11-26T11:53:00.000Z", "result": { "his_pageLoad": { "breaks": [ "Infinity", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "-Infinity" ], "counts": [ "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN" ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ "NaN", "NaN", "NaN" ], "min": "Infinity", "max": "-Infinity" } } }, ...... { "timestamp": "2018-11-26T12:02:00.000Z", "result": { "his_pageLoad": { "breaks": [ -0.5, 1, 2.5, 4, 5.5, 7, 8.5, 10 ], "counts": [ 1, 1, 2, 1, 2, 1, 2 ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 9, 9.5, 9.9 ], "min": 1, "max": 10 } } }, ...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "his_pageLoad": { "breaks": [ -0.5, 1, 2.5, 4, 5.5, 7, 8.5, 10 ], "counts": [ 1, 1, 2, 1, 3, 1, 1 ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 8, 9, 9.799999 ], "min": 1, "max": 10 } } }, ...... ]
比較兩次:druid將我的十條數據聚合在點的:2018-11-26T12:22:00.000Z,數據的logTime是:2018-11-26 20:22:21,十條數據1+2+。。。+10=53,且:TP90是8,TP95是9,TP99是9.799999,這里的TP90之所以是8是因為排序了,第九個是8。
繼續來吧
第三步:讓msg 中的logTime=1543235747000,讓for循環100次,即:pageLoad從1到100的十條數據到kafka。
執行腳本1("intervals":["2018-11-26T20:07:05.311+08:00/2018-11-26T20:37:05.311+08:00"]):
[ ...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "pageLoad": 53 } }, ...... { "timestamp": "2018-11-26T12:35:00.000Z", "result": { "pageLoad": 5050 } }, ...... ]
執行腳本2("intervals":["2018-11-26T20:07:05.311+08:00/2018-11-26T20:37:05.311+08:00"]):
[ ...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "his_pageLoad": { "breaks": [ -0.5, 1, 2.5, 4, 5.5, 7, 8.5, 10 ], "counts": [ 1, 1, 2, 1, 3, 1, 1 ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 8, 9, 9.799999 ], "min": 1, "max": 10 } } }, ...... { "timestamp": "2018-11-26T12:35:00.000Z", "result": { "his_pageLoad": { "breaks": [ -15.5, 1, 17.5, 34, 50.5, 67, 83.5, 100 ], "counts": [ 1, 16, 17, 16, 17, 16, 17 ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 90, 95, 99 ], "min": 1, "max": 100 } } }, ...... ]
自行分析吧,再來
第四步:讓msg 中的logTime=1543236350000,讓for循環200次,即:pageLoad從1到200的十條數據到kafka。
執行腳本1("intervals":["2018-11-26T20:16:40.524+08:00/2018-11-26T20:46:40.524+08:00"]):
[ ...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "pageLoad": 53 } }, ...... { "timestamp": "2018-11-26T12:35:00.000Z", "result": { "pageLoad": 5050 } }, ...... { "timestamp": "2018-11-26T12:45:00.000Z", "result": { "pageLoad": 20100 } }, { "timestamp": "2018-11-26T12:46:00.000Z", "result": { "pageLoad": 0 } } ]
執行腳本2("intervals":["2018-11-26T20:16:40.524+08:00/2018-11-26T20:46:40.524+08:00"]):
[ ...... { "timestamp": "2018-11-26T12:22:00.000Z", "result": { "his_pageLoad": { "breaks": [ -0.5, 1, 2.5, 4, 5.5, 7, 8.5, 10 ], "counts": [ 1, 1, 2, 1, 3, 1, 1 ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 8, 9, 9.799999 ], "min": 1, "max": 10 } } }, ...... { "timestamp": "2018-11-26T12:35:00.000Z", "result": { "his_pageLoad": { "breaks": [ -15.5, 1, 17.5, 34, 50.5, 67, 83.5, 100 ], "counts": [ 1, 16, 17, 16, 17, 16, 17 ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 90, 95, 99 ], "min": 1, "max": 100 } } }, ...... { "timestamp": "2018-11-26T12:45:00.000Z", "result": { "his_pageLoad": { "breaks": [ -32.16666793823242, 1, 34.16666793823242, 67.33333587646484, 100.5, 133.6666717529297, 166.83334350585938, 200 ], "counts": [ 1, 33, 33, 33, 33, 33, 34 ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ 180, 190, 198 ], "min": 1, "max": 200 } } }, { "timestamp": "2018-11-26T12:46:00.000Z", "result": { "his_pageLoad": { "breaks": [ "Infinity", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "-Infinity" ], "counts": [ "NaN", "NaN", "NaN", "NaN", "NaN", "NaN", "NaN" ] }, "響應時間": { "probabilities": [ 0.9, 0.95, 0.99 ], "quantiles": [ "NaN", "NaN", "NaN" ], "min": "Infinity", "max": "-Infinity" } } } ]
還是自行分析吧,
到此:Druid直方圖分位數的實戰就結束了,比較看出,druid在將數據聚合到一個點的時候,先把數據進行升序排序,然后取TP分位數的那個值來單做聚合點的分位數值。
下班了。。。
轉載請附上原創路徑啊,引流哈哈哈:https://www.cnblogs.com/wynjauu/articles/10022863.html
