Hive查詢優化～布隆過濾器使用

本文轉載自查看原文 2021-03-01 20:55 367 Hive

技術說明：http://lxw1234.com/archives/2016/04/632.htm

hive表是orc 存儲

本文優化方法：使用 bloom filter 和二級動態分區

實操：

　　1，建表：

CREATE TABLE test(
    mall_id bigint COMMENT '店鋪id',
    mall_collection_id bigint COMMENT '商家包id',
    city_id bigint COMMENT '城市id', 
    city_name string COMMENT '城市名稱',
    province_id bigint COMMENT '省份id',
    province_name string COMMENT '省份',
    is_illegal bigint COMMENT '是否違規',
    stat_day string COMMENT '統計時間'
)
COMMENT 'XXXX'
PARTITIONED BY ( 
  pt string COMMENT '分區日期',
  mall_col_id bigint COMMENT 'id')
    STORED AS ORC
TBLPROPERTIES
('orc.compress'='SNAPPY',
'orc.create.index'='true',
"orc.bloom.filter.columns"="mall_collection_id,stat_day", -- 這樣建索引原因是接口用這兩個查詢數據
'orc.bloom.filter.fpp'='0.05',
'orc.stripe.size'='10485760',
'orc.row.index.stride'='10000') 
;

2，數據插入結果表：

INSERT OVERWRITE TABLE test PARTITION(pt = '${env.YYYYMMDD}', mall_col_id)
SELECT
    mall_id,
    mall_collection_id,
    city_id,
    city_name,
    province_id,
    province_name,
    is_illegal,
    stat_day,
    mall_collection_id % 1000 as mall_col_id
from
    A
DISTRIBUTE BY mall_collection_id SORT BY mall_collection_id,stat_day -- 這里和索引保持一致
;

因為bloom filter 可以過濾無效的數據，減少數據的掃描

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 基於Redis擴展模塊的布隆過濾器使用 guava布隆過濾器的使用 redis布隆過濾器的使用布隆過濾器原理及使用布隆過濾器布隆過濾器布隆過濾器布隆過濾器在redis中的使用布隆過濾器的原理以及使用場景 redis——bloom(布隆過濾器模塊)的使用