Hive 表分區

本文轉載自查看原文 2017-09-19 11:13 17192 Hive

Hive表的分區就是一個目錄，分區字段不和表的字段重復

創建分區表：

create table tb_partition(id string, name string)
PARTITIONED BY (month string)
row format delimited fields terminated by '\t';

加載數據到hive分區表中

方法一：通過load方式加載

load data local inpath '/home/hadoop/files/nameinfo.txt' overwrite into table tb_partition partition(month='201709');

方法二：insert select 方式

insert overwrite table tb_partition partition(month='201707') select id, name from name;

hive> insert into table tb_partition partition(month='201707') select id, name from name;
Query ID = hadoop_20170918222525_7d074ba1-bff9-44fc-a664-508275175849
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator

方法三：可通過手動上傳文件到分區目錄，進行加載

hdfs dfs -mkdir /user/hive/warehouse/tb_partition/month=201710
hdfs dfs -put nameinfo.txt /user/hive/warehouse/tb_partition/month=201710

雖然方法三手動上傳文件到分區目錄，但是查詢表的時候是查詢不到數據的，需要更新元數據信息。

更新源數據的兩種方法：

方法一：msck repair table 表名

hive> msck repair table tb_partition;
OK
Partitions not in metastore:    tb_partition:month=201710
Repair: Added partition to metastore tb_partition:month=201710
Time taken: 0.265 seconds, Fetched: 2 row(s)

方法二：alter table tb_partition add partition(month='201708');

hive> alter table tb_partition add partition(month='201708');
OK
Time taken: 0.126 seconds

查詢表數據：

hive> select *from tb_partition ;
OK
1       Lily    201708
2       Andy    201708
3       Tom     201708
1       Lily    201709
2       Andy    201709
3       Tom     201709
1       Lily    201710
2       Andy    201710
3       Tom     201710
Time taken: 0.161 seconds, Fetched: 9 row(s)

查詢分區信息： show partitions 表名

hive> show partitions tb_partition;
OK
month=201708
month=201709
month=201710
Time taken: 0.154 seconds, Fetched: 3 row(s)

查看hdfs中的文件結構

[hadoop@node11 files]$ hdfs dfs -ls /user/hive/warehouse/tb_partition/
17/09/18 22:33:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
drwxr-xr-x   - hadoop supergroup          0 2017-09-18 22:25 /user/hive/warehouse/tb_partition/month=201707
drwxr-xr-x   - hadoop supergroup          0 2017-09-18 22:15 /user/hive/warehouse/tb_partition/month=201708
drwxr-xr-x   - hadoop supergroup          0 2017-09-18 05:55 /user/hive/warehouse/tb_partition/month=201709
drwxr-xr-x   - hadoop supergroup          0 2017-09-18 22:03 /user/hive/warehouse/tb_partition/month=201710

創建多級分區

create table tb_mul_partition(id string, name string)
PARTITIONED BY (month string, code string)
row format delimited fields terminated by '\t';

加載數據：

load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201709',code='10000'); 
load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201710',code='10000');

查詢數據：

hive> select *From tb_mul_partition where code='10000';
OK
1       Lily    201709  10000
2       Andy    201709  10000
3       Tom     201709  10000
1       Lily    201710  10000
2       Andy    201710  10000
3       Tom     201710  10000
Time taken: 0.208 seconds, Fetched: 6 row(s)

測試以下指定一個分區：

hive> load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201708'); 
FAILED: SemanticException [Error 10006]: Line 1:95 Partition not found ''201708''

hive> load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(code='20000'); 
FAILED: SemanticException [Error 10006]: Line 1:95 Partition not found ''20000''

創建是多級分區，指定一個分區是不可以的。

查看一下在hdfs中存儲的結構：

[hadoop@node11 files]$ hdfs dfs -ls /user/hive/warehouse/tb_mul_partition/month=201710
drwxr-xr-x   - hadoop supergroup          0 2017-09-18 22:36 /user/hive/warehouse/tb_mul_partition/month=201710/code=10000

動態分區

回顧一下之前的向分區插入數據：

insert overwrite table tb_partition partition(month='201707') select id, name from name;

這里需要指定具體的分區信息‘201707’，這里通過動態操作，向表里插入數據。

新建表：

hive> create table tb_copy_partition like tb_partition;
OK
Time taken: 0.118 seconds

查看一下表結構：

hive> desc tb_copy_partition;
OK
id                      string                                      
name                    string                                      
month                   string                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
month                   string                                      
Time taken: 0.127 seconds, Fetched: 8 row(s)

接下來通過動態操作，向tb_copy_partitioon里面插入數據，

insert into table tb_copy_partition partition(month) select id, name, month from tb_partition; 這里注意需要將分區字段month放到最后。

hive> insert into table tb_copy_partition partition(month) select id, name, month from tb_partition;
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict

這里報錯，使用動態加載，需要 To turn this off set hive.exec.dynamic.partition.mode=nonstrict

那根據錯誤信息設置一下

hive> set hive.exec.dynamic.partition.mode=nonstrict;

查詢設置信息，設置成功

hive> set hive.exec.dynamic.partition.mode;
hive.exec.dynamic.partition.mode=nonstrict

重新執行：

hive> insert into table tb_copy_partition partition(month) select id, name, month from tb_partition;
Query ID = hadoop_20170918230808_0bf202da-279f-4df3-a153-ece0e457c905
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1505785612206_0002, Tracking URL = http://node11:8088/proxy/application_1505785612206_0002/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.10.0/bin/hadoop job  -kill job_1505785612206_0002
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2017-09-18 23:08:13,698 Stage-1 map = 0%,  reduce = 0%
2017-09-18 23:08:23,896 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 1.94 sec
2017-09-18 23:08:27,172 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.63 sec
MapReduce Total cumulative CPU time: 3 seconds 630 msec
Ended Job = job_1505785612206_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://cluster1/user/hive/warehouse/tb_copy_partition/.hive-staging_hive_2017-09-18_23-08-01_475_7542657053989652968-1/-ext-10000
Loading data to table default.tb_copy_partition partition (month=null)
         Time taken for load dynamic partitions : 381
        Loading partition {month=201709}
        Loading partition {month=201708}
        Loading partition {month=201710}
        Loading partition {month=201707}
         Time taken for adding to write entity : 0
Partition default.tb_copy_partition{month=201707} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
Partition default.tb_copy_partition{month=201708} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
Partition default.tb_copy_partition{month=201709} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
Partition default.tb_copy_partition{month=201710} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 2   Cumulative CPU: 3.63 sec   HDFS Read: 8926 HDFS Write: 382 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 630 msec
OK
Time taken: 28.932 seconds

查詢一下數據：

hive> select *From tb_copy_partition;
OK
1       Lily    201707
2       Andy    201707
3       Tom     201707
1       Lily    201708
2       Andy    201708
3       Tom     201708
1       Lily    201709
2       Andy    201709
3       Tom     201709
1       Lily    201710
2       Andy    201710
3       Tom     201710
Time taken: 0.121 seconds, Fetched: 12 row(s)

完成

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive 7、Hive 的內表、外表、分區 hive 表分區操作 Hive分區與桶表 hive的分區表 hive表分區的修復 Hive分區表的分區操作 Hive之分區表 hive創建分區表 hive修改表/分區語句 hive之建立分區表和分區