hive一級分區、二級分區、動態分區

本文轉載自查看原文 2020-03-17 19:24 3897 Hive

一級分區

1、hive分區是根據某列的值進行划分，每個分區對應HDFS上的一個目錄，以下就是分區表test.table_t在HDFS的存儲路徑，可以看到有202002和202003兩個分區，且分區字段為month。

2、創建分區表

1 create table table_name(
2     no int,name string
3     )
4     partitioned by (month string)
5     row format delimited fields terminated by "\t";

3、增加分區

1 alter table table_name add partition(month="202004") partition(month="202005");

4、刪除分區

1 alter table table_name drop partition(month="202006") partition(month="202007");

5、查看分區表

1 -- 查看表分區
2 show partition table_name;
3 
4 -- 查看分區表結構
5 desc format table_name;

6、數據加載

1 load data local inpath "/opt/module/datas/data.txt" into table table_name partition(month="202003");
2 insert into table table_name partition (month="202003") values(...);

二級分區

1、建表語句

1 create table table_name(
2     no int,name string
3     )
4     partitioned by (month string,day string)
5     row format delimited fields terminated by "\t";

2、數據加載

 1 -- 從外部存儲系統正常加載數據
 2 load data local inpath "/opt/module/datas/data.txt" into table table_name partition(month="202003",day="02");
 3 
 4 -- 上傳到HDFS后恢復
 5 hive (default) > dfs -mkdir -p /user/hive/warehouse/table_name/month=202003/day=02;
 6 hive (default) > dfs -put /opt/module/datas/data.txt /user/hive/warehouse/table_name/month=202003/day=02;
 7 hive (default) > msck repair table table_name;
 8 
 9 -- 上傳數據到HDFS后添加分區
10 hive (default) > dfs -mkdir -p /user/hive/warehouse/table_name/month=202003/day=02;
11 hive (default) > dfs -put /opt/module/datas/data.txt /user/hive/warehouse/table_name/month=202003/day=02;
12 hive (default) > alter table table_name add partition(month="202003",day="03");
13 
14 -- 上傳數據到HDFS后load數據到分區
15 hive (default) > dfs -mkdir -p /user/hive/warehouse/table_name/month=202003/day=02;
16 hive (defalut) > load data local inpath "/opt/module/datas/data.txt" in to table table_name partition(month="202003",day="03");

動態分區

1、動態分區所需屬性

set hive.exec.dynamic.partition=true; --開啟動態分區，必須參數
set hive.exec.dynamic.partition.mode=nonstrict(默認static); --允許所有分區都是動態的，否則必須有靜態分區字段，必須參數
set hive.exec.max.dynamic.partitions.pernode=100; --(默認100，一般可以設置大一點，表示每個mapper或reducer可以創建的最大動態分區數)
set hive.exec.max.dynamic.partitions=1000;--(默認值，表示每一個動態分區語句創建的最大動態分區數)

2、創建一個單分區表

1 create table table_name(id int,name string) partition by (city string);

3、數據加載

 1 -- 裝載數據並動態以city建立分區
 2 --因為table_name只有兩個字段，所以查詢三個字段時，系統默認將最后一個字段city作為分區名，分區字段也默認也是表中的字段，且依次排在表字段最后面，不是按照字段名稱推斷分區字段。
 3 insert overwrite table table_name partition(city) select id,name,city from src_table;
 4 
 5 -- 多個分區字段（部分靜態分區部分動態分區）
 6 create table target_table(id int) partitioned by (state string,city string);
 7 --partition(state="china",city)，表示state為靜態分區，city為動態分區，以src_table中的city字段為分區名
 8 insert overwrite table target_table partition(state="china",city) select id,city from src_table;
 9 --state和city均使用動態分區
10 insert overwrite table target_table partition(state,city) select id,state,city from src_table;

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive分區和Hive動態分區 hive動態分區和混合分區 Hive的靜態分區和動態分區 hive分區（靜態和動態分區） Hive分區（靜態分區+動態分區） hive 動態分區與混合分區 Hive動態分區 Hive 動態分區 Hive的動態分區 hive動態分區