一、SummingMergeTree
1.SummingMergeTree介紹
求和引擎繼承自 MergeTree。區別在於,當合並 SummingMergeTree 表的數據片段時,ClickHouse 會把所有具有相同主鍵的行合並為一行,該行包含了被合並的行中具有數值數據類型的列的匯總值。如果主鍵的組合方式使得單個鍵值對應於大量的行,則可以顯著的減少存儲空間並加快數據查詢的速度。
2.建表語句
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] ( name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1], name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2], ... ) ENGINE = SummingMergeTree([columns]) [PARTITION BY expr] [ORDER BY expr] [SAMPLE BY expr] [SETTINGS name=value, ...]
columns - 包含了將要被匯總的列的列名的元組。可選參數。
所選的列必須是數值類型,並且不可位於主鍵中。
3.使用示例
--本地表 create table test.summing_table_test1 ( v1 Int32, v2 Int32, name String, total_date DateTime ) ENGINE = SummingMergeTree((v1,v2)) order by (name) partition by toDate(total_date) SETTINGS index_granularity = 8192; --寫入測試數據: insert into test.summing_table_test1 values (1,2,'a',now()),(2,2,'a',now()-1*60*60),(3,4,'b',now()); --強制合並 optimize table test.summing_table_test1 FINAL; --查詢數據: SELECT * FROM test.summing_table_test1 Query id: 2da82c96-2a90-496a-83fe-8a6528ba336c ┌─v1─┬─v2─┬─name─┬──────────total_date─┐ │ 3 │ 4 │ a │ 2021-10-13 11:41:12 │ │ 3 │ 4 │ b │ 2021-10-13 11:41:12 │ └────┴────┴──────┴─────────────────────┘
二、AggregatingMergeTree
1.AggregatingMergeTree 介紹
該表引擎繼承自MergeTree,可以使用 AggregatingMergeTree 表來做增量數據統計聚合。如果要按一組規則來合並減少行數,則使用 AggregatingMergeTree 是合適的。AggregatingMergeTree是通過預先定義的聚合函數計算數據並通過二進制的格式存入表內。
是SummingMergeTree的加強版,SummingMergeTree能做的是對非主鍵列進行sum聚合,而AggregatingMergeTree則可以指定各種聚合函數。
2.建表語句
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster] ( name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1], name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2], ... ) ENGINE = AggregatingMergeTree() [PARTITION BY expr] [ORDER BY expr] [SAMPLE BY expr] [TTL expr] [SETTINGS name=value, ...]
3.使用示例
1)計算匯總員工工資
--建表: CREATE TABLE emp_aggregatingmergeTree ( emp_id UInt16 COMMENT '員工id', name String COMMENT '員工姓名', work_place String COMMENT '工作地點', age UInt8 COMMENT '員工年齡', depart String COMMENT '部門', salary AggregateFunction(sum, Decimal32(2)) COMMENT '工資' ) ENGINE = AggregatingMergeTree() ORDER BY (emp_id, name) PRIMARY KEY emp_id PARTITION BY work_place; ORDER BY (emp_id,name) -- 注意排序key是兩個字段 PRIMARY KEY emp_id -- 主鍵是一個字段 --對於AggregateFunction類型的列字段,在進行數據的寫入和查詢時與其他的表引擎有很大區別,在寫入數據時,需要調用-State函數;而在查詢數據時,則需要調用相應的-Merge函數。對於上面的建表語句而言,需要使用sumState函數進行數據插入 -- 插入數據, -- 注意:需要使用INSERT…SELECT語句進行數據插入 INSERT INTO TABLE emp_aggregatingmergeTree SELECT 1,'tom','上海',25,'信息部',sumState(toDecimal32(10000,2)); INSERT INTO TABLE emp_aggregatingmergeTree SELECT 1,'tom','上海',25,'信息部',sumState(toDecimal32(20000,2)); -- 查詢數據 SELECT emp_id,name,sumMerge(salary) FROM emp_aggregatingmergeTree GROUP BY emp_id,name; -- 結果輸出 ┌─emp_id─┬─name─┬─sumMerge(salary)─┐ │ 1 │ tom │ 30000.00 │ └────────┴──────┴──────────────────┘ --AggregatingMergeTree通常作為物化視圖的表引擎,與普通MergeTree搭配使用。物化視圖是作為其他數據表上層的一種查詢視圖。 -- 創建一個MereTree引擎的明細表 -- 用於存儲全量的明細數據 -- 對外提供實時查詢 CREATE TABLE emp_mergetree_base ( emp_id UInt16 COMMENT '員工id', name String COMMENT '員工姓名', work_place String COMMENT '工作地點', age UInt8 COMMENT '員工年齡', depart String COMMENT '部門', salary Decimal32(2) COMMENT '工資' ) ENGINE = MergeTree() ORDER BY (emp_id, name) PARTITION BY work_place; -- 創建一張物化視圖 -- 使用AggregatingMergeTree表引擎 CREATE MATERIALIZED VIEW view_emp_agg ENGINE = AggregatingMergeTree() PARTITION BY emp_id ORDER BY (emp_id, name) AS SELECT emp_id, name, sumState(salary) AS salary FROM emp_mergetree_base GROUP BY emp_id, name; -- 向基礎明細表emp_mergetree_base插入數據 INSERT INTO emp_mergetree_base VALUES (1,'tom','上海',25,'技術部',20000),(1,'tom','上海',26,'人事部',10000); -- 查詢物化視圖 SELECT emp_id,name,sumMerge(salary) FROM view_emp_agg GROUP BY emp_id,name; -- 結果 ┌─emp_id─┬─name─┬─sumMerge(salary)─┐ │ 1 │ tom │ 50000.00 │ └────────┴──────┴──────────────────┘
2)展示每一個節點cpu 利用率的當前值
使用argMaxState 聚合列
create materialized view cpu_last_point_idle_mv engine = AggregatingMergeTree() partition by tuple() order by tags_id populate as select argMaxState(create_date,created_at) as created_data, maxState(create_at) as max_created_max, argMaxState(time,created_at) as time, tags_id, argMaxState(usage_idle,created_at) as usage_idle from cpu group by tags_id
argMax(a,b) 函數返回 b 最大值時 a的值
State 為聚合函數的后綴,聚合函數加此后綴不直接返回結果,返回聚合函數的中間結果,該中間結果可在AggregatingMergeTree 引擎中使用
使用Merge函數后綴得到聚合結果
create view cpu_last_point_idle_v as select argMaxMerge(created_date) as created_date, maxMerge(max_created_at) as created_at, argMaxMerge(time) as time, tags_id, argMaxMerge(usage_idle) as usage_idle from cpu_last_point_idle_mv group by tags_id
查詢結果視圖
select tags_id, 100 - usage_idle usage from cpu_last_point_idle_v order by usage desc,tags_id asc limit 10
3)創建一個跟蹤tb_test_MergeTree_basic表的物化視圖
create materialized view tb_test_AggregatingMergeTree_view ENGINE = AggregatingMergeTree() PARTITION BY (brandId,shopId) ORDER BY (brandId,shopId) as select brandId,shopId,sumState(saleMoney) saleMoney,sumState(saleQty) saleQty,countState(1) saleNum,uniqState(vipId) vipNum from tb_test_MergeTree_basic group by brandId,shopId
b64d9704419c :) create materialized view tb_test_AggregatingMergeTree_view ENGINE = AggregatingMergeTree() PARTITION BY (brandId,shopId) ORDER BY (brandId,shopId) as select brandId,shopId,sumState(saleMoney) saleMoney,sumState(saleQty) saleQty,countState(1) saleNum,uniqState(vipId) vipNum from tb_test_MergeTree_basic group by brandId,shopId CREATE MATERIALIZED VIEW tb_test_AggregatingMergeTree_view ENGINE = AggregatingMergeTree() PARTITION BY (brandId, shopId) ORDER BY (brandId, shopId) AS SELECT brandId, shopId, sumState(saleMoney) AS saleMoney, sumState(saleQty) AS saleQty, countState(1) AS saleNum, uniqState(vipId) AS vipNum FROM tb_test_MergeTree_basic GROUP BY brandId, shopId Ok. 0 rows in set. Elapsed: 0.012 sec. b64d9704419c :)
show table 可見比普通表多了“.inner.”前綴
目錄名稱也比普通表多了一些類似亂碼的字符
建視圖前已經存在的數據不能跟蹤
tb_test_MergeTree_basic 表原來已經在創建物化視圖的時候已經有數據了
b64d9704419c :) select * from tb_test_MergeTree_basic SELECT * FROM tb_test_MergeTree_basic ┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐ │ 429 │ 6002 │ 2020-10-07 │ 200.5 │ 40 │ 10002 │ └─────────┴────────┴────────────┴───────────┴─────────┴───────┘ ┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐ │ 429 │ 6002 │ 2020-10-05 │ 200.5 │ 10 │ 10001 │ └─────────┴────────┴────────────┴───────────┴─────────┴───────┘ ┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐ │ 429 │ 6001 │ 2020-10-07 │ 200.5 │ 30 │ 10003 │ └─────────┴────────┴────────────┴───────────┴─────────┴───────┘ ┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐ │ 429 │ 6002 │ 2020-10-04 │ 200.5 │ 40 │ 10001 │ └─────────┴────────┴────────────┴───────────┴─────────┴───────┘ ┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐ │ 429 │ 6001 │ 2020-10-01 │ 200.5 │ 10 │ 10001 │ │ 429 │ 6001 │ 2020-10-02 │ 200.5 │ 20 │ 10002 │ │ 429 │ 6001 │ 2020-10-03 │ 200.5 │ 30 │ 10003 │ │ 429 │ 6001 │ 2020-10-04 │ 200.5 │ 10 │ 10001 │ │ 429 │ 6001 │ 2020-10-05 │ 200.5 │ 20 │ 10001 │ └─────────┴────────┴────────────┴───────────┴─────────┴───────┘ ┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐ │ 429 │ 6001 │ 2020-10-06 │ 200.5 │ 30 │ 10003 │ └─────────┴────────┴────────────┴───────────┴─────────┴───────┘
執行一次optimize table tb_test_AggregatingMergeTree_view
再查tb_test_AggregatingMergeTree_view視圖
b64d9704419c :) select * from tb_test_AggregatingMergeTree_view SELECT * FROM tb_test_AggregatingMergeTree_view Ok. 0 rows in set. Elapsed: 0.003 sec. b64d9704419c :)
可見沒有跟蹤建表之前的已經存在的數據
可以跟蹤建視圖后再插入的數據
1)插入2條數據
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-08 14:15:23',200.50,30,10003) insert into tb_test_MergeTree_basic values (429,6002,'2020-10-08 14:15:23',200.50,40,10002)
2)查看
b64d9704419c :) select * from tb_test_AggregatingMergeTree_view SELECT * FROM tb_test_AggregatingMergeTree_view ┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐ │ 429 │ 6001 │ i@ │ │ │ ³Gw │ └─────────┴────────┴───────────┴─────────┴─────────┴────────┘ ┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐ │ 429 │ 6002 │ i@ │ ( │ │ $a6㞠 │ └─────────┴────────┴───────────┴─────────┴─────────┴────────┘ 2 rows in set. Elapsed: 0.008 sec. b64d9704419c :)
3)聚合結果
b64d9704419c :) select brandId,shopId,sumMerge(saleMoney) saleMoney,sumMerge(saleQty) saleQty,countMerge(saleNum) saleNum,uniqMerge(vipNum) vipNum from tb_test_AggregatingMergeTree_view group by brandId,shopId SELECT brandId, shopId, sumMerge(saleMoney) AS saleMoney, sumMerge(saleQty) AS saleQty, countMerge(saleNum) AS saleNum, uniqMerge(vipNum) AS vipNum FROM tb_test_AggregatingMergeTree_view GROUP BY brandId, shopId ┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐ │ 429 │ 6002 │ 200.5 │ 40 │ 1 │ 1 │ │ 429 │ 6001 │ 200.5 │ 30 │ 1 │ 1 │ └─────────┴────────┴───────────┴─────────┴─────────┴────────┘ 2 rows in set. Elapsed: 0.005 sec. b64d9704419c :)
4)繼續插入新的數據
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,10,10001) insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,20,10002) insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,30,10003) insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,10,10001) insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,20,10001) insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,30,10003) insert into tb_test_MergeTree_basic values (429,6002,'2020-10-09 14:15:23',200.50,40,10001) insert into tb_test_MergeTree_basic values (429,6002,'2020-10-09 14:15:23',200.50,10,10001) insert into tb_test_MergeTree_basic values (429,6001,'2020-10-10 14:15:23',200.50,10,10001)
5)查看
可見分區沒有合並
b64d9704419c :) select * from tb_test_AggregatingMergeTree_view SELECT * FROM tb_test_AggregatingMergeTree_view ┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐ │ 429 │ 6001 │ i@ │ │ │ l │ └─────────┴────────┴───────────┴─────────┴─────────┴────────┘ ┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐ │ 429 │ 6001 │ i@ │ │ │ ³Gw │ └─────────┴────────┴───────────┴─────────┴─────────┴────────┘ ┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐ │ 429 │ 6002 │ i@ │ │ │ l │ └─────────┴────────┴───────────┴─────────┴─────────┴────────┘ ┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐ │ 429 │ 6001 │ i@ │ │ │ l │ └─────────┴────────┴───────────┴─────────┴─────────┴────────┘ ┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐ │ 429 │ 6002 │ i@ │ ( │ │ $a6㞠 │ └─────────┴────────┴───────────┴─────────┴─────────┴────────┘ ┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum───┐ │ 429 │ 6001 │ T@ │ d │ │ l ³Gw │ └─────────┴────────┴───────────┴─────────┴─────────┴──────────┘ ┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐ │ 429 │ 6002 │ i@ │ ( │ │ l │ └─────────┴────────┴───────────┴─────────┴─────────┴────────┘ 7 rows in set. Elapsed: 0.004 sec. b64d9704419c :)
6)觀察自動跟蹤
b64d9704419c :) select brandId,shopId,sumMerge(saleMoney) saleMoney,sumMerge(saleQty) saleQty,countMerge(saleNum) saleNum,uniqMerge(vipNum) vipNum from tb_test_AggregatingMergeTree_view group by brandId,shopId SELECT brandId, shopId, sumMerge(saleMoney) AS saleMoney, sumMerge(saleQty) AS saleQty, countMerge(saleNum) AS saleNum, uniqMerge(vipNum) AS vipNum FROM tb_test_AggregatingMergeTree_view GROUP BY brandId, shopId ┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐ │ 429 │ 6002 │ 601.5 │ 90 │ 3 │ 2 │ │ 429 │ 6001 │ 1604 │ 160 │ 8 │ 3 │ └─────────┴────────┴───────────┴─────────┴─────────┴────────┘ 2 rows in set. Elapsed: 0.010 sec. b64d9704419c :)
7)可見確實已經自動跟蹤聚合了
但是只是創建視圖后插入的數據才能跟蹤,驗證SQL 如下
select brandId,shopId,sum(saleMoney),sum(saleQty),count(1),uniq(vipId) from tb_test_MergeTree_basic where saleDate>='2020-10-08' group by brandId,shopId
執行結果
b64d9704419c :) select brandId,shopId,sum(saleMoney),sum(saleQty),count(1),uniq(vipId) from tb_test_MergeTree_basic where saleDate>='2020-10-08' group by brandId,shopId SELECT brandId, shopId, sum(saleMoney), sum(saleQty), count(1), uniq(vipId) FROM tb_test_MergeTree_basic WHERE saleDate >= '2020-10-08' GROUP BY brandId, shopId ┌─brandId─┬─shopId─┬─sum(saleMoney)─┬─sum(saleQty)─┬─count(1)─┬─uniq(vipId)─┐ │ 429 │ 6002 │ 601.5 │ 90 │ 3 │ 2 │ │ 429 │ 6001 │ 1604 │ 160 │ 8 │ 3 │ └─────────┴────────┴────────────────┴──────────────┴──────────┴─────────────┘ 2 rows in set. Elapsed: 0.003 sec. b64d9704419c :)
8)創建視圖前原來已經存在的數據是不能被跟蹤的
下面的這部分值不能被跟蹤
select brandId,shopId,sum(saleMoney),sum(saleQty),count(1),uniq(vipId) from tb_test_MergeTree_basic where saleDate<'2020-10-08' group by brandId,shopId SELECT brandId, shopId, sum(saleMoney), sum(saleQty), count(1), uniq(vipId) FROM tb_test_MergeTree_basic WHERE saleDate < '2020-10-08' GROUP BY brandId, shopId ┌─brandId─┬─shopId─┬─sum(saleMoney)─┬─sum(saleQty)─┬─count(1)─┬─uniq(vipId)─┐ │ 429 │ 6002 │ 601.5 │ 90 │ 3 │ 2 │ │ 429 │ 6001 │ 1403.5 │ 150 │ 7 │ 3 │ └─────────┴────────┴────────────────┴──────────────┴──────────┴─────────────┘ 2 rows in set. Elapsed: 0.003 sec. b64d9704419c :)
4.總結
1)使用ORDER BY排序鍵作為聚合數據的依據
2)使用AggregateFunction字段類型定義聚合函數的類型以及聚合字段
3)只有在合並分區的時候才會觸發聚合計算的邏輯
4)聚合只會發生在同分區內,不同分區的數據不會發生聚合
5)在進行數據計算時,因為同分區的數據已經基於ORDER BY排序,所以能夠找到相鄰且具有相同聚合key的數據
6)在聚合數據時,同一分區內,相同聚合key的多行數據會合並成一行,對於那些非主鍵、非AggregateFunction類型字段,則會取第一行數據
7)AggregateFunction類型字段使用二進制存儲,在寫入數據時,需要調用state函數;在讀數據時,需要調用merge函數,*表示定義時使用的聚合函數
8)AggregateMergeTree通常作為物化視圖的引擎,與普通的MergeTree搭配使用
5.注意
可以使用AggregatingMergeTree
表來做增量數據統計聚合,包括物化視圖的數據聚合
1)AggregatingMergeTree表不能跟蹤basic表,在執行inser select 之后查的數據無法進行聚合,只能inser select 之前的數據聚合
2)AggregatingMergeTree物化視圖可以跟蹤basic表,但是在視圖創建前已經存在的數據不能被跟蹤,只能跟蹤聚合視圖創建后新插入的數據