1、表結構設置
【1】建表時指定副本數量:relication_num
【2】排序鍵
明細模型:DUPLICATE KEY(site_id, city_code)
聚合模型:AGGREGATE KEY(site_id, city_code)
更新模型:UNIQUE KEY(site_id, city_code)
BloomFilter索引:PROPERTIES ( "bloom_filter_columns"="k1,k2,k3" )
【3】分桶配置
DISTRIBUTED BY HASH(site_id) BUCKETS 10
【4】查看Docker容器IP
docker inspect --format='{{.NetworkSettings.IPAddress}}' doris-be1
【5】表配置
PROPERTIES (
"replication_num" = "1", //副本數
"colocate_with" = "group1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
【6】Olap表
ENGINE=OLAP
2、基礎操作
【1】建表
明細模型 CREATE TABLE site_access_duplicate ( site_id INT DEFAULT '10', city_code SMALLINT, user_name VARCHAR(32) DEFAULT '', pv BIGINT DEFAULT '0' ) DUPLICATE KEY(site_id, city_code) DISTRIBUTED BY HASH(site_id) BUCKETS 10; 聚合模型 CREATE TABLE site_access_aggregate ( site_id INT DEFAULT '10', city_code SMALLINT, pv BIGINT SUM DEFAULT '0' ) AGGREGATE KEY(site_id, city_code) DISTRIBUTED BY HASH(site_id) BUCKETS 10; 更新模型 CREATE TABLE site_access_unique ( site_id INT DEFAULT '10', city_code SMALLINT, user_name VARCHAR(32) DEFAULT '', pv BIGINT DEFAULT '0' ) UNIQUE KEY(site_id, city_code) DISTRIBUTED BY HASH(site_id) BUCKETS 10;
【2】明細模型插入測試數據
INSERT INTO site_access_duplicate VALUES(10010,10,"wangshida",1), (10011,10,"xiaohong",2), (10012,10,"xiaoming",15)
【3】更新數據(不支持),通過更新模型插入數據方式實現
【4】刪除數據(支持,比較慢)
delete from site_access_duplicate where site_id=10022
3、分析Sql
site_access_duplicate 明細模型
site_access_aggregate 聚合模型
site_access_unique 更新模型
【1】限制兩個排序鍵
explain select * from site_access_duplicate where site_id = 10010 and city_code = 10;
0:OlapScanNode
TABLE: site_access_duplicate
PREAGGREGATION: ON
PREDICATES: `site_id` = 10010, `city_code` = 10
partitions=1/1
rollup: site_access_duplicate
tabletRatio=1/10
tabletList=11012
cardinality=4
avgRowSize=144.75
numNodes=3
tuple ids: 0
【2】只限制第一個排序鍵site_id
explain select * from site_access_duplicate where site_id = 10010
0:OlapScanNode
TABLE: site_access_duplicate
PREAGGREGATION: ON
PREDICATES: `site_id` = 10010
partitions=1/1
rollup: site_access_duplicate
tabletRatio=1/10
tabletList=11012
cardinality=4
avgRowSize=144.75
numNodes=3
tuple ids: 0
【3】只限制第二個排序鍵city_code
explain select * from site_access_duplicate where city_code = 2;
0:OlapScanNode
TABLE: site_access_duplicate
PREAGGREGATION: ON
PREDICATES: `city_code` = 10
partitions=1/1
rollup: site_access_duplicate
tabletRatio=10/10
tabletList=11004,11008,11012,11016,11020,11024,11028,11032,11036,11040
cardinality=11
avgRowSize=262.45456
numNodes=3
tuple ids: 0
4、物化視圖,對於走不了shortkey的可以建物化視圖解決
基礎表 CREATE TABLE site_access_duplicate ( site_id INT DEFAULT '10', city_code SMALLINT, user_name VARCHAR(32) DEFAULT '', pv BIGINT DEFAULT '0' ) DUPLICATE KEY(site_id, city_code) DISTRIBUTED BY HASH(site_id) BUCKETS 10;
【1】創建物化視圖
【注】報錯errCode = 2, detailMessage = The materialized view is coming soon
對明細模型創建物化視圖,需要在Fe配置文件中新增
enable_materialized_view=true
CREATE MATERIALIZED VIEW `site_access_duplicate_pv_view` AS SELECT city_code, SUM(pv) AS sum_pv FROM site_access_duplicate GROUP BY city_code ORDER BY city_code
【2】查看數據庫下物化視圖
SHOW ALTER TABLE ROLLUP FROM test;
如果State為"FINISHED"說明基表到物化視圖已經創建完成
【3】查看表物化視圖結果
desc site_access_duplicate all;
【4】分析查詢Sql是否走物化視圖
PREAGGREGATION: ON 和 rollup: site_access_duplicate_pv_view說明使用物化視圖


【5】刪除物化視圖
DROP MATERIALIZED VIEW IF EXISTS site_access_duplicate_pv_view from site_access_duplicate;
【6】智能路由規則
選擇包含所有查詢列的MV表
按照過濾和排序的Column篩選最符合的MV表
按照Join的Column篩選最符合的MV表
行數最小的MV表
列數最小的MV表
注意點:
(1)必須是單個表聚合
(2)支持以下聚合函數
COUNT
MAX
MIN
SUM
PERCENTILE_APPROX
HLL_UNION
(3)RollUp表的模型必須和Base表保持一致(聚合表的RollUp表是聚合模型,明細表的RollUp表是明細模型)
(4)Delete 操作時,如果 Where 條件中的某個 Key 列在某個 RollUp表中不存在,則不允許進行 Delete。
例刪除username,該字段在物化視圖不存在,則不允許刪除。要不物化視圖不更新數據可能對不上,解決辦法刪物化視圖
5、bitmap索引
參考文章:https://zhuanlan.zhihu.com/p/54783053
【1】創建索引
CREATE INDEX idx_site_id ON site_access_duplicate (city_code)
USING BITMAP COMMENT '城市索引';
【2】查看表配置的索引
SHOW INDEX FROM site_access_duplicate;
【3】刪除索引
DROP INDEX idx_site_id ON site_access_duplicate;
注意事項
(1)列都可以建Bitmap 索引;對於聚合模型,只有Key列可以建Bitmap 索引
(2)Bitmap索引, 應該在取值為枚舉型, 取值大量重復, 較低基數
(3)不支持對Float、Double、Decimal 類型的列建Bitmap 索引
6、Bloomfilter索引
【1】添加索引,建表時指定
PROPERTIES ( "bloom_filter_columns"="city_code,pv" )
【2】查看索引
SHOW CREATE TABLE site_access_duplicate;
【3】刪除索引
ALTER TABLE site_access_duplicate SET ("bloom_filter_columns" = "");
【4】修改索引
ALTER TABLE site_access_duplicate SET ("bloom_filter_columns" = "city_code,pv");
【5】打開Fe的Report開關 set is_report_success=true;

驗證Sql
ALTER TABLE site_access_duplicate SET ("bloom_filter_columns" = "user_name");
select * from site_access_duplicate where user_name = 'xiaoming';

注意事項
(1)不支持對Tinyint、Float、Double 類型的列建Bloom Filter索引
(2)Bloom Filter索引只對in和=過濾查詢有加速效果
(3)如果要查看某個查詢是否命中了Bloom Filter索引,可以通過查詢的Profile信息查看