hive中的分桶表

本文轉載自查看原文 2019-04-22 18:39 570 hive

桶表也是一種用於優化查詢而設計的表類型。
創建通表時，指定桶的個數、分桶的依據字段，hive就可以自動將數據分桶存儲。
查詢時只需要遍歷一個桶里的數據，或者遍歷部分桶，這樣就提高了查詢效率

------創建訂單表
create table user_leads
(
leads_id string,
user_id string,
user_id string,
user_phone string,
user_name string,
create_time string
)
clustered by (user_id)
sorted by(leads_id)
into 10 buckets
row format delimited fields terminated by '\t'
stored as textfile;

clustered by是指根據 user_id 的值進行哈希后模除分桶個數，
根據得到的結果，確定這行數據分入哪個桶中，這樣的分法，
可以確保相同 user_id 的數據放入同一個桶中。
而經銷商的訂單數據，大部分是根據user_id進行查詢的。
這樣大部分情況下是只需要查詢一個桶中的數據就可以了。
sorted by 是指定桶中的數據以哪個字段進行排序，排序的好處是，在join操作時能獲得很高的效率。
into 10 buckets是指定一共分10個桶
在HDFS上存儲時，一個桶存入一個文件中，這樣根據user_id進行查詢時，可以快速確定數據存在於哪個桶中，而只遍歷一個桶可以提供查詢效率

加載到分桶表
------先創建普通臨時表
create table user_leads_tmp
(
leads_id string,
user_id string,
user_id string,
user_phone string,
user_name string,
create_time string
)
row format delimited fields terminated by ','
stored as textfile;
------數據載入臨時表
load data local inpath '/home/hadoop/lead.txt' overwrite into table user_leads_tmp;
------導入分桶表
set hive.enforce.bucketing = true; -- 為true就是設置為啟用分桶。
insert overwrite table user_leads select * from user_leads_tmp;

drop table sospdm.tmp_yinfei_test;
create table sospdm.tmp_yinfei_test
(
id string,cust_num string
)partitioned by (statis_date string) clustered by (id) sorted by (id) into 5 buckets
row format delimited fields terminated by ','
;
1,cust_num_1
2,cust_num_2
3,cust_num_3
4,cust_num_4
5,cust_num_5
6,cust_num_6
7,cust_num_7
8,cust_num_8
9,cust_num_9

drop table sospdm.tmp_yinfei_test_tmp;
create table sospdm.tmp_yinfei_test_tmp
(
id string,cust_num string
)partitioned by (statis_date string)
row format delimited fields terminated by ','
;

load data local inpath '/home/sospdm/yf/test.txt' overwrite into table tmp_yinfei_test_tmp partition (statis_date='20190408');

set hive.enforce.bucketing = true;
insert overwrite table tmp_yinfei_test partition(statis_date='20190408') select id,cust_num from tmp_yinfei_test_tmp;

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive的分桶表 Hive學習筆記——Hive中的分桶 Hive SQL之分區表與分桶表 Hive分區表分桶表的認識與區別 Hive 分區和分桶 Hive為什么要分桶 hive的分桶原理 hive的分桶 hive分桶表bucketed table分桶字段選擇與個數確定 HIVE-分桶表的詳解和創建實例