1、Hive的內表
Hive 的內表,就是正常創建的表,在 http://www.cnblogs.com/raphael5200/p/5208437.html 中已經提到;
2、Hive的外表
創建Hive 的外表,需要使用關鍵字 External:
CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) ON ((col_value, col_value, ...), (col_value, col_value, ...), ...) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format]
下面看一個例子:
create External table food_ex ( id int, name string, category string, price double ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' lines terminated by '\n';
-- 加載數據 load data local inpath '/opt/food.txt' overwrite into table food_ex;
select * from food_ex;
這兩個,左邊是外表,右邊是內表從大體上看似乎沒什么區別,但是他的主要區別在於刪除操作上:
內表刪除表或者分區元數據和數據都刪了
外表刪除表元數據刪除,數據保留
下面分別執行兩條語句:
drop table food; drop table food_ex;
執行這兩條語句以后,兩個表都刪除了,但是結果卻不一樣,訪問NameNode的50070端口:
可以看到,雖然都執行了表刪除語句,內表刪除后是把元數據和數據都刪除了,而外表卻只刪除了元數據(表的信息)但真實數據卻保留了下來;
3、Hive的分區partition
必須在表定義時創建partition
a、單分區建表語句:
create table day_table (id int, content string) partitioned by (dt string);
單分區表,按天分區,在表結構中存在id,content,dt三列。 以dt為文件夾區分
例:
create table log_info ( ip string ) PARTITIONED BY(times string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' lines terminated by '\n';
# 下面是log_info 的表結構信息,分區已經創建 hive> desc log_info; OK ip string times string # Partition Information # col_name data_type comment times string Time taken: 0.077 seconds, Fetched: 7 row(s)
b、 雙分區建表語句
create table day_hour_table (id int, content string) partitioned by (dt string, hour string);
雙分區表,按天和小時分區,在表結構中新增加了dt和hour兩列。 先以dt為文件夾,再以hour子文件夾區分
create table log_info2 ( ip string ) PARTITIONED BY(days string,hours string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' lines terminated by '\n';
# 下面是log_info2 的表結構信息,分區已經創建 hive> desc log_info2; OK ip string days string hours string # Partition Information # col_name data_type comment days string hours string Time taken: 0.08 seconds, Fetched: 9 row(s)
c、Hive添加分區表語法 (表已創建,在此基礎上添加分區):
ALTER TABLE table_name ADDpartition_spec [ LOCATION 'location1' ] partition_spec [ LOCATION 'location2' ] ... ALTER TABLE day_table ADDPARTITION (dt='2008-08-08', hour='08') location '/path/pv1.txt'
d、Hive刪除分區語法:
ALTER TABLE table_name DROP PARTITION partition_spec, partition_spec,...
用戶可以用 ALTER TABLE DROP PARTITION 來刪除分區。分區的元數據和數據將被一並刪除。
ALTER TABLE day_hour_table DROP PARTITION (dt='2008-08-08', hour='09');
alter table log_info drop partition (times='20160222');
e、Hive數據加載進分區表中語法:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
例:
單分區數據加載
load data local inpath '/opt/log' overwrite into table log_info partition(times='20160223'); load data local inpath '/opt/log2' overwrite into table log_info partition(times='20160222');
hive> select * from log_info; OK 23.45.66.77 20160222 45.66.11.8 20160222 2.3.4.5 20160223 4.56.77.31 20160223 34.55.6.77 20160223 34.66.11.6 20160223 Time taken: 0.125 seconds, Fetched: 6 row(s)
在Hive中會根據分區的名稱新建兩個分區目錄
雙分區數據加載
load data local inpath '/opt/log3' overwrite into table log_info2 partition(days='23',hours='12');
hive> select * from log_info2; OK 12.3.33.66 23 12 23.44.56.6 23 12 12.22.33.4 23 12 8.78.99.4 23 12 233.23.211.2 23 12 Time taken: 0.069 seconds, Fetched: 5 row(s)
當數據被加載至表中時,不會對數據進行任何轉換。Load操作只是將數據復制至Hive表對應的位置。數據加載時在表下自動創建一個目錄基於分區的查詢的語句:
SELECT day_table.* FROM day_table WHERE day_table.dt>= '2008-08-08';
f、Hive查看分區語句:
hive> show partitions day_hour_table; OK dt=2008-08-08/hour=08 dt=2008-08-08/hour=09 dt=2008-08-09/hour=09
hive> show partitions log_info; OK times=20160222 times=20160223 Time taken: 0.06 seconds, Fetched: 2 row(s)