clickHouse的SummingMergeTree引擎以及AggregatingMergeTree引擎使用介紹


一、SummingMergeTree

1.SummingMergeTree介紹


求和引擎繼承自 MergeTree。區別在於,當合並 SummingMergeTree 表的數據片段時,ClickHouse 會把所有具有相同主鍵的行合並為一行,該行包含了被合並的行中具有數值數據類型的列的匯總值。如果主鍵的組合方式使得單個鍵值對應於大量的行,則可以顯著的減少存儲空間並加快數據查詢的速度。

2.建表語句

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = SummingMergeTree([columns])
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...]

 

columns - 包含了將要被匯總的列的列名的元組。可選參數。
所選的列必須是數值類型,並且不可位於主鍵中。

3.使用示例

--本地表
create table test.summing_table_test1 
(
v1 Int32,
v2 Int32,
name String,
total_date DateTime
) ENGINE = SummingMergeTree((v1,v2))
order by (name)
partition by toDate(total_date)
SETTINGS index_granularity = 8192;

--寫入測試數據:

insert into test.summing_table_test1
values (1,2,'a',now()),(2,2,'a',now()-1*60*60),(3,4,'b',now());

--強制合並
optimize table test.summing_table_test1  FINAL;
--查詢數據:

SELECT *
FROM test.summing_table_test1

Query id: 2da82c96-2a90-496a-83fe-8a6528ba336c

┌─v1─┬─v2─┬─name─┬──────────total_date─┐
│ 34 │ a │ 2021-10-13 11:41:12 │
│ 34 │ b │ 2021-10-13 11:41:12 │
└────┴────┴──────┴─────────────────────┘

 

二、AggregatingMergeTree

1.AggregatingMergeTree 介紹
該表引擎繼承自MergeTree,可以使用 AggregatingMergeTree 表來做增量數據統計聚合。如果要按一組規則來合並減少行數,則使用 AggregatingMergeTree 是合適的。AggregatingMergeTree是通過預先定義的聚合函數計算數據並通過二進制的格式存入表內。

是SummingMergeTree的加強版,SummingMergeTree能做的是對非主鍵列進行sum聚合,而AggregatingMergeTree則可以指定各種聚合函數

2.建表語句

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = AggregatingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]

 

3.使用示例

1)計算匯總員工工資

--建表:

CREATE TABLE emp_aggregatingmergeTree
(
emp_id UInt16 COMMENT '員工id',
name String COMMENT '員工姓名',
work_place String COMMENT '工作地點',
age UInt8 COMMENT '員工年齡',
depart String COMMENT '部門',
salary AggregateFunction(sum, Decimal32(2)) COMMENT '工資'
) ENGINE = AggregatingMergeTree() ORDER BY (emp_id, name) PRIMARY KEY emp_id PARTITION BY work_place;


ORDER BY (emp_id,name) -- 注意排序key是兩個字段
PRIMARY KEY emp_id -- 主鍵是一個字段

--對於AggregateFunction類型的列字段,在進行數據的寫入和查詢時與其他的表引擎有很大區別,在寫入數據時,需要調用-State函數;而在查詢數據時,則需要調用相應的-Merge函數。對於上面的建表語句而言,需要使用sumState函數進行數據插入

-- 插入數據,
-- 注意:需要使用INSERT…SELECT語句進行數據插入
INSERT INTO TABLE emp_aggregatingmergeTree SELECT 1,'tom','上海',25,'信息部',sumState(toDecimal32(10000,2));
INSERT INTO TABLE emp_aggregatingmergeTree SELECT 1,'tom','上海',25,'信息部',sumState(toDecimal32(20000,2));
-- 查詢數據
SELECT emp_id,name,sumMerge(salary) FROM emp_aggregatingmergeTree GROUP BY emp_id,name;
-- 結果輸出
┌─emp_id─┬─name─┬─sumMerge(salary)─┐
│ 1 │ tom │ 30000.00 │
└────────┴──────┴──────────────────┘

--AggregatingMergeTree通常作為物化視圖的表引擎,與普通MergeTree搭配使用。物化視圖是作為其他數據表上層的一種查詢視圖。

 

-- 創建一個MereTree引擎的明細表
-- 用於存儲全量的明細數據
-- 對外提供實時查詢
CREATE TABLE emp_mergetree_base
(
emp_id UInt16 COMMENT '員工id',
name String COMMENT '員工姓名',
work_place String COMMENT '工作地點',
age UInt8 COMMENT '員工年齡',
depart String COMMENT '部門',
salary Decimal32(2) COMMENT '工資'
) ENGINE = MergeTree() ORDER BY (emp_id, name) PARTITION BY work_place;

-- 創建一張物化視圖
-- 使用AggregatingMergeTree表引擎
CREATE MATERIALIZED VIEW view_emp_agg ENGINE = AggregatingMergeTree() PARTITION BY emp_id ORDER BY (emp_id, name) AS
SELECT emp_id, name, sumState(salary) AS salary
FROM emp_mergetree_base
GROUP BY emp_id, name;

-- 向基礎明細表emp_mergetree_base插入數據
INSERT INTO emp_mergetree_base VALUES (1,'tom','上海',25,'技術部',20000),(1,'tom','上海',26,'人事部',10000);

-- 查詢物化視圖
SELECT emp_id,name,sumMerge(salary) FROM view_emp_agg GROUP BY emp_id,name;
-- 結果
┌─emp_id─┬─name─┬─sumMerge(salary)─┐
│ 1 │ tom │ 50000.00 │
└────────┴──────┴──────────────────┘

 2)展示每一個節點cpu 利用率的當前值

使用argMaxState 聚合列
create materialized view cpu_last_point_idle_mv 
engine = AggregatingMergeTree()
partition by tuple()
order by tags_id
populate
as select
argMaxState(create_date,created_at) as created_data,
maxState(create_at) as max_created_max,
argMaxState(time,created_at) as time,
tags_id,
argMaxState(usage_idle,created_at) as usage_idle
from cpu 
group by tags_id

argMax(a,b) 函數返回 b 最大值時 a的值

State 為聚合函數的后綴,聚合函數加此后綴不直接返回結果,返回聚合函數的中間結果,該中間結果可在AggregatingMergeTree 引擎中使用

使用Merge函數后綴得到聚合結果
create view cpu_last_point_idle_v as
select 
argMaxMerge(created_date) as created_date,
maxMerge(max_created_at) as created_at,
argMaxMerge(time) as time,
tags_id,
argMaxMerge(usage_idle) as usage_idle
from cpu_last_point_idle_mv
group by tags_id
查詢結果視圖
select 
tags_id,
100 - usage_idle usage
from cpu_last_point_idle_v
order by usage desc,tags_id asc
limit 10

3)創建一個跟蹤tb_test_MergeTree_basic表的物化視圖

create materialized view tb_test_AggregatingMergeTree_view ENGINE = AggregatingMergeTree() PARTITION BY (brandId,shopId) ORDER BY (brandId,shopId) as select brandId,shopId,sumState(saleMoney) saleMoney,sumState(saleQty) saleQty,countState(1) saleNum,uniqState(vipId)  vipNum from tb_test_MergeTree_basic group by brandId,shopId
b64d9704419c :) create materialized view tb_test_AggregatingMergeTree_view ENGINE = AggregatingMergeTree() PARTITION BY (brandId,shopId) ORDER BY (brandId,shopId) as select brandId,shopId,sumState(saleMoney) saleMoney,sumState(saleQty) saleQty,countState(1) saleNum,uniqState(vipId)  vipNum from tb_test_MergeTree_basic group by brandId,shopId
 
CREATE MATERIALIZED VIEW tb_test_AggregatingMergeTree_view
ENGINE = AggregatingMergeTree()
PARTITION BY (brandId, shopId)
ORDER BY (brandId, shopId) AS
SELECT 
    brandId, 
    shopId, 
    sumState(saleMoney) AS saleMoney, 
    sumState(saleQty) AS saleQty, 
    countState(1) AS saleNum, 
    uniqState(vipId) AS vipNum
FROM tb_test_MergeTree_basic
GROUP BY 
    brandId, 
    shopId
 
Ok.
 
0 rows in set. Elapsed: 0.012 sec. 
 
b64d9704419c :) 

show table  可見比普通表多了“.inner.”前綴

 

 

目錄名稱也比普通表多了一些類似亂碼的字符

 建視圖前已經存在的數據不能跟蹤

tb_test_MergeTree_basic 表原來已經在創建物化視圖的時候已經有數據了

b64d9704419c :) select * from tb_test_MergeTree_basic 
 
SELECT *
FROM tb_test_MergeTree_basic
 
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960022020-10-07200.54010002 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960022020-10-05200.51010001 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960012020-10-07200.53010003 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960022020-10-04200.54010001 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960012020-10-01200.51010001 │
│     42960012020-10-02200.52010002 │
│     42960012020-10-03200.53010003 │
│     42960012020-10-04200.51010001 │
│     42960012020-10-05200.52010001 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960012020-10-06200.53010003 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘

執行一次optimize table tb_test_AggregatingMergeTree_view

再查tb_test_AggregatingMergeTree_view視圖

b64d9704419c :) select * from tb_test_AggregatingMergeTree_view
 
SELECT *
FROM tb_test_AggregatingMergeTree_view
 
Ok.
 
0 rows in set. Elapsed: 0.003 sec. 
 
b64d9704419c :)

可見沒有跟蹤建表之前的已經存在的數據

可以跟蹤建視圖后再插入的數據

1)插入2條數據

insert into tb_test_MergeTree_basic values (429,6001,'2020-10-08 14:15:23',200.50,30,10003)
insert into tb_test_MergeTree_basic values (429,6002,'2020-10-08 14:15:23',200.50,40,10002)

2)查看

b64d9704419c :) select * from tb_test_AggregatingMergeTree_view
 
SELECT *
FROM tb_test_AggregatingMergeTree_view
 
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296001 │ i@        │         │         │ ³Gw     │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002 │ i@        │ (       │         │ $a6㞠  │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
 
2 rows in set. Elapsed: 0.008 sec. 
 
b64d9704419c :)

3)聚合結果

b64d9704419c :) select brandId,shopId,sumMerge(saleMoney) saleMoney,sumMerge(saleQty) saleQty,countMerge(saleNum) saleNum,uniqMerge(vipNum)  vipNum from tb_test_AggregatingMergeTree_view group by brandId,shopId
 
SELECT 
    brandId, 
    shopId, 
    sumMerge(saleMoney) AS saleMoney, 
    sumMerge(saleQty) AS saleQty, 
    countMerge(saleNum) AS saleNum, 
    uniqMerge(vipNum) AS vipNum
FROM tb_test_AggregatingMergeTree_view
GROUP BY 
    brandId, 
    shopId
 
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002200.54011 │
│     4296001200.53011 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
 
2 rows in set. Elapsed: 0.005 sec. 
 
b64d9704419c :)

4)繼續插入新的數據

insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,10,10001)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,20,10002)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,30,10003)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,10,10001)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,20,10001)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,30,10003)
insert into tb_test_MergeTree_basic values (429,6002,'2020-10-09 14:15:23',200.50,40,10001)
insert into tb_test_MergeTree_basic values (429,6002,'2020-10-09 14:15:23',200.50,10,10001)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-10 14:15:23',200.50,10,10001)

 5)查看

可見分區沒有合並

b64d9704419c :) select * from tb_test_AggregatingMergeTree_view
 
SELECT *
FROM tb_test_AggregatingMergeTree_view
 
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296001 │ i@        │         │         │ l 
                                                      򞞠 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296001 │ i@        │         │         │ ³Gw     │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002 │ i@        │ 
        │         │ l 
                     򞞠 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296001 │ i@        │ 
        │         │ l 
                     򞞠 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002 │ i@        │ (       │         │ $a6㞠  │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum───┐
│     4296001 │ T@        │ d       │         │ l 
                                                      󷥄³Gw │
└─────────┴────────┴───────────┴─────────┴─────────┴──────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002 │ i@        │ (       │         │ l 
                                                      򞞠 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
 
7 rows in set. Elapsed: 0.004 sec. 
 
b64d9704419c :)

6)觀察自動跟蹤

b64d9704419c :) select brandId,shopId,sumMerge(saleMoney) saleMoney,sumMerge(saleQty) saleQty,countMerge(saleNum) saleNum,uniqMerge(vipNum)  vipNum from tb_test_AggregatingMergeTree_view group by brandId,shopId
 
SELECT 
    brandId, 
    shopId, 
    sumMerge(saleMoney) AS saleMoney, 
    sumMerge(saleQty) AS saleQty, 
    countMerge(saleNum) AS saleNum, 
    uniqMerge(vipNum) AS vipNum
FROM tb_test_AggregatingMergeTree_view
GROUP BY 
    brandId, 
    shopId
 
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002601.59032 │
│     4296001160416083 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
 
2 rows in set. Elapsed: 0.010 sec. 
 
b64d9704419c :) 

7)可見確實已經自動跟蹤聚合了

但是只是創建視圖后插入的數據才能跟蹤,驗證SQL 如下

select brandId,shopId,sum(saleMoney),sum(saleQty),count(1),uniq(vipId) from tb_test_MergeTree_basic  where saleDate>='2020-10-08' group by brandId,shopId

執行結果

b64d9704419c :) select brandId,shopId,sum(saleMoney),sum(saleQty),count(1),uniq(vipId) from tb_test_MergeTree_basic  where saleDate>='2020-10-08' group by brandId,shopId
 
SELECT 
    brandId, 
    shopId, 
    sum(saleMoney), 
    sum(saleQty), 
    count(1), 
    uniq(vipId)
FROM tb_test_MergeTree_basic
WHERE saleDate >= '2020-10-08'
GROUP BY 
    brandId, 
    shopId
 
┌─brandId─┬─shopId─┬─sum(saleMoney)─┬─sum(saleQty)─┬─count(1)─┬─uniq(vipId)─┐
│     4296002601.59032 │
│     4296001160416083 │
└─────────┴────────┴────────────────┴──────────────┴──────────┴─────────────┘
 
2 rows in set. Elapsed: 0.003 sec. 
 
b64d9704419c :) 

8)創建視圖前原來已經存在的數據是不能被跟蹤的

下面的這部分值不能被跟蹤

select brandId,shopId,sum(saleMoney),sum(saleQty),count(1),uniq(vipId) from tb_test_MergeTree_basic  where saleDate<'2020-10-08' group by brandId,shopId
 
SELECT 
    brandId, 
    shopId, 
    sum(saleMoney), 
    sum(saleQty), 
    count(1), 
    uniq(vipId)
FROM tb_test_MergeTree_basic
WHERE saleDate < '2020-10-08'
GROUP BY 
    brandId, 
    shopId
 
┌─brandId─┬─shopId─┬─sum(saleMoney)─┬─sum(saleQty)─┬─count(1)─┬─uniq(vipId)─┐
│     4296002601.59032 │
│     42960011403.515073 │
└─────────┴────────┴────────────────┴──────────────┴──────────┴─────────────┘
 
2 rows in set. Elapsed: 0.003 sec. 
 
b64d9704419c :) 

 

4.總結

1)使用ORDER BY排序鍵作為聚合數據的依據
2)使用AggregateFunction字段類型定義聚合函數的類型以及聚合字段
3)只有在合並分區的時候才會觸發聚合計算的邏輯
4)聚合只會發生在同分區內,不同分區的數據不會發生聚合
5)在進行數據計算時,因為同分區的數據已經基於ORDER BY排序,所以能夠找到相鄰且具有相同聚合key的數據
6)在聚合數據時,同一分區內,相同聚合key的多行數據會合並成一行,對於那些非主鍵、非AggregateFunction類型字段,則會取第一行數據
7)AggregateFunction類型字段使用二進制存儲,在寫入數據時,需要調用state函數;在讀數據時,需要調用merge函數,*表示定義時使用的聚合函數
8)AggregateMergeTree通常作為物化視圖的引擎,與普通的MergeTree搭配使用

5.注意

      可以使用AggregatingMergeTree表來做增量數據統計聚合,包括物化視圖的數據聚合

1)AggregatingMergeTree表不能跟蹤basic表,在執行inser select 之后查的數據無法進行聚合,只能inser select 之前的數據聚合
2)AggregatingMergeTree物化視圖可以跟蹤basic表,但是在視圖創建前已經存在的數據不能被跟蹤,只能跟蹤聚合視圖創建后新插入的數據


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM