Hive表的分區就是一個目錄,分區字段不和表的字段重復
創建分區表:
create table tb_partition(id string, name string) PARTITIONED BY (month string) row format delimited fields terminated by '\t';
加載數據到hive分區表中
方法一:通過load方式加載
load data local inpath '/home/hadoop/files/nameinfo.txt' overwrite into table tb_partition partition(month='201709');
方法二:insert select 方式
insert overwrite table tb_partition partition(month='201707') select id, name from name;
hive> insert into table tb_partition partition(month='201707') select id, name from name; Query ID = hadoop_20170918222525_7d074ba1-bff9-44fc-a664-508275175849 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator
方法三:可通過手動上傳文件到分區目錄,進行加載
hdfs dfs -mkdir /user/hive/warehouse/tb_partition/month=201710
hdfs dfs -put nameinfo.txt /user/hive/warehouse/tb_partition/month=201710
雖然方法三手動上傳文件到分區目錄,但是查詢表的時候是查詢不到數據的,需要更新元數據信息。
更新源數據的兩種方法:
方法一:msck repair table 表名
hive> msck repair table tb_partition; OK Partitions not in metastore: tb_partition:month=201710 Repair: Added partition to metastore tb_partition:month=201710 Time taken: 0.265 seconds, Fetched: 2 row(s)
方法二:alter table tb_partition add partition(month='201708');
hive> alter table tb_partition add partition(month='201708'); OK Time taken: 0.126 seconds
查詢表數據:
hive> select *from tb_partition ; OK 1 Lily 201708 2 Andy 201708 3 Tom 201708 1 Lily 201709 2 Andy 201709 3 Tom 201709 1 Lily 201710 2 Andy 201710 3 Tom 201710 Time taken: 0.161 seconds, Fetched: 9 row(s)
查詢分區信息: show partitions 表名
hive> show partitions tb_partition; OK month=201708 month=201709 month=201710 Time taken: 0.154 seconds, Fetched: 3 row(s)
查看hdfs中的文件結構
[hadoop@node11 files]$ hdfs dfs -ls /user/hive/warehouse/tb_partition/ 17/09/18 22:33:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 4 items drwxr-xr-x - hadoop supergroup 0 2017-09-18 22:25 /user/hive/warehouse/tb_partition/month=201707 drwxr-xr-x - hadoop supergroup 0 2017-09-18 22:15 /user/hive/warehouse/tb_partition/month=201708 drwxr-xr-x - hadoop supergroup 0 2017-09-18 05:55 /user/hive/warehouse/tb_partition/month=201709 drwxr-xr-x - hadoop supergroup 0 2017-09-18 22:03 /user/hive/warehouse/tb_partition/month=201710
創建多級分區
create table tb_mul_partition(id string, name string) PARTITIONED BY (month string, code string) row format delimited fields terminated by '\t';
加載數據:
load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201709',code='10000'); load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201710',code='10000');
查詢數據:
hive> select *From tb_mul_partition where code='10000'; OK 1 Lily 201709 10000 2 Andy 201709 10000 3 Tom 201709 10000 1 Lily 201710 10000 2 Andy 201710 10000 3 Tom 201710 10000 Time taken: 0.208 seconds, Fetched: 6 row(s)
測試以下指定一個分區:
hive> load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201708'); FAILED: SemanticException [Error 10006]: Line 1:95 Partition not found ''201708''
hive> load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(code='20000'); FAILED: SemanticException [Error 10006]: Line 1:95 Partition not found ''20000''
創建是多級分區,指定一個分區是不可以的。
查看一下在hdfs中存儲的結構:
[hadoop@node11 files]$ hdfs dfs -ls /user/hive/warehouse/tb_mul_partition/month=201710 drwxr-xr-x - hadoop supergroup 0 2017-09-18 22:36 /user/hive/warehouse/tb_mul_partition/month=201710/code=10000
動態分區
回顧一下之前的向分區插入數據:
insert overwrite table tb_partition partition(month='201707') select id, name from name;
這里需要指定具體的分區信息‘201707’,這里通過動態操作,向表里插入數據。
新建表:
hive> create table tb_copy_partition like tb_partition; OK Time taken: 0.118 seconds
查看一下表結構:
hive> desc tb_copy_partition; OK id string name string month string # Partition Information # col_name data_type comment month string Time taken: 0.127 seconds, Fetched: 8 row(s)
接下來通過動態操作,向tb_copy_partitioon里面插入數據,
insert into table tb_copy_partition partition(month) select id, name, month from tb_partition; 這里注意需要將分區字段month放到最后。
hive> insert into table tb_copy_partition partition(month) select id, name, month from tb_partition; FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
這里報錯,使用動態加載,需要 To turn this off set hive.exec.dynamic.partition.mode=nonstrict
那根據錯誤信息設置一下
hive> set hive.exec.dynamic.partition.mode=nonstrict;
查詢設置信息,設置成功
hive> set hive.exec.dynamic.partition.mode; hive.exec.dynamic.partition.mode=nonstrict
重新執行:
hive> insert into table tb_copy_partition partition(month) select id, name, month from tb_partition; Query ID = hadoop_20170918230808_0bf202da-279f-4df3-a153-ece0e457c905 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1505785612206_0002, Tracking URL = http://node11:8088/proxy/application_1505785612206_0002/ Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.10.0/bin/hadoop job -kill job_1505785612206_0002 Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0 2017-09-18 23:08:13,698 Stage-1 map = 0%, reduce = 0% 2017-09-18 23:08:23,896 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.94 sec 2017-09-18 23:08:27,172 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.63 sec MapReduce Total cumulative CPU time: 3 seconds 630 msec Ended Job = job_1505785612206_0002 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to: hdfs://cluster1/user/hive/warehouse/tb_copy_partition/.hive-staging_hive_2017-09-18_23-08-01_475_7542657053989652968-1/-ext-10000 Loading data to table default.tb_copy_partition partition (month=null) Time taken for load dynamic partitions : 381 Loading partition {month=201709} Loading partition {month=201708} Loading partition {month=201710} Loading partition {month=201707} Time taken for adding to write entity : 0 Partition default.tb_copy_partition{month=201707} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17] Partition default.tb_copy_partition{month=201708} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17] Partition default.tb_copy_partition{month=201709} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17] Partition default.tb_copy_partition{month=201710} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17] MapReduce Jobs Launched: Stage-Stage-1: Map: 2 Cumulative CPU: 3.63 sec HDFS Read: 8926 HDFS Write: 382 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 630 msec OK Time taken: 28.932 seconds
查詢一下數據:
hive> select *From tb_copy_partition; OK 1 Lily 201707 2 Andy 201707 3 Tom 201707 1 Lily 201708 2 Andy 201708 3 Tom 201708 1 Lily 201709 2 Andy 201709 3 Tom 201709 1 Lily 201710 2 Andy 201710 3 Tom 201710 Time taken: 0.121 seconds, Fetched: 12 row(s)
完成