hive動態分區

本文轉載自查看原文 2019-02-18 16:46 9876 大數據

往hive分區表中插入數據時，如果需要創建的分區很多，比如以表中某個字段進行分區存儲，則需要復制粘貼修改很多sql去執行，效率低。因為hive是批處理系統，所以hive提供了一個動態分區功能，其可以基於查詢參數的位置去推斷分區的名稱，從而建立分區。

1.創建一個單一字段分區表

1 hive>
2    create table dpartition(id int ,name string )
3    partitioned by(ct string  );

2.往表里裝載數據，並且動態建立分區，以city建立動態分區

 1 hive>
 2  hive.exec.dynamici.partition=true;  #開啟動態分區，默認是false
 3  set hive.exec.dynamic.partition.mode=nonstrict; #開啟允許所有分區都是動態的，否則必須要有靜態分區才能使用。
 4  insert overwrite table dpartition
 5  partition(ct)
 6  select id ,name,city from  mytest_tmp2_p; 
 7  
 8 要點：因為dpartition表中只有兩個字段，所以當我們查詢了三個字段時（多了city字段），所以系統默認以最后一個字段city為分區名，因為分區表的
 9 分區字段默認也是該表中的字段，且依次排在表中字段的最后面。所以分區需要分區的字段只能放在后面，不能把順序弄錯。如果我們查詢了四個字段的話，則會報
10 錯，因為該表加上分區字段也才三個。要注意系統是根據查詢字段的位置推斷分區名的，而不是字段名稱。
11 hive>--查看可知，hive已經完成了以city字段為分區字段，實現了動態分區。
12 hive (fdm_sor)> show partitions dpartition;
13 partition
14 ct=beijing
15 ct=beijing1

注意：使用，insert...select 往表中導入數據時，查詢的字段個數必須和目標的字段個數相同，不能多，也不能少,否則會報錯。但是如果字段的類型不一致的話，則會使用null值填充，不會報錯。而使用load data形式往hive表中裝載數據時，則不會檢查。如果字段多了則會丟棄，少了則會null值填充。同樣如果字段類型不一致，也是使用null值填充。

3.多個分區字段時，實現半自動分區（部分字段靜態分區，注意靜態分區字段要在動態前面）

 1 1.創建一個只有一個字段，兩個分區字段的分區表
 2 hive (fdm_sor)> create table ds_parttion(id int ) 
 3               > partitioned by (state string ,ct string );
 4 2.往該分區表半動態分區插入數據 
 5 hive>
 6  set hive.exec.dynamici.partition=true;
 7  set hive.exec.dynamic.partition.mode=nonstrict;
 8  insert overwrite table ds_parttion
 9  partition(state='china',ct)  #state分區為靜態，ct為動態分區，以查詢的city字段為分區名
10  select id ,city from  mytest_tmp2_p; 
11  
12 3.查詢結果顯示：
13 hive (fdm_sor)> select *  from ds_parttion where state='china'
14               > ;
15 ds_parttion.id  ds_parttion.state       ds_parttion.ct
16 4       china   beijing
17 3       china   beijing
18 2       china   beijing
19 1       china   beijing
20 4       china   beijing1
21 3       china   beijing1
22 2       china   beijing1
23 1       china   beijing1
24  
25 hive (fdm_sor)> select *  from ds_parttion where state='china' and ct='beijing';
26 ds_parttion.id  ds_parttion.state       ds_parttion.ct
27 4       china   beijing
28 3       china   beijing
29 2       china   beijing
30 1       china   beijing
31  
32 hive (fdm_sor)> select *  from ds_parttion where state='china' and ct='beijing1';
33 ds_parttion.id  ds_parttion.state       ds_parttion.ct
34 4       china   beijing1
35 3       china   beijing1
36 2       china   beijing1
37 1       china   beijing1
38 Time taken: 0.072 seconds, Fetched: 4 row(s)

4.多個分區字段時，全部實現動態分區插入數據

1  set hive.exec.dynamici.partition=true;
2  set hive.exec.dynamic.partition.mode=nonstrict;
3  insert overwrite table ds_parttion
4  partition(state,ct)
5  select id ,country,city from  mytest_tmp2_p; 
6 注意：字段的個數和順序不能弄錯。

5.動態分區表的屬性

使用動態分區表必須配置的參數：

set hive.exec.dynamic.partition =true（默認false）,表示開啟動態分區功能
set hive.exec.dynamic.partition.mode = nonstrict(默認strict),表示允許所有分區都是動態的，否則必須有靜態分區字段

動態分區相關的調優參數：

set hive.exec.max.dynamic.partitions.pernode=100 （默認100，一般可以設置大一點，比如1000）

表示每個maper或reducer可以允許創建的最大動態分區個數，默認是100，超出則會報錯。

set hive.exec.max.dynamic.partitions =1000(默認值)

表示一個動態分區語句可以創建的最大動態分區個數，超出報錯

set hive.exec.max.created.files =10000(默認) 全局可以創建的最大文件個數，超出報錯。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive分區和Hive動態分區 hive動態分區和混合分區 Hive的靜態分區和動態分區 hive分區（靜態和動態分區） hive 動態分區與混合分區 Hive動態分區 Hive 動態分區 Hive的動態分區 Hive動態分區詳解 hive動態分區