1)加載到普通表
-
-
加載本地文本文件內容(要與hive表字段分隔符順序都要一致)
load data local inpath
'/home/hadoop/orders.csv'
overwrite into table orders;
1
> 如果數據源在HDFS上,則 load data inpath
'hdfs://master:9000/user/orders'
overwrite into table orders;
2
) 加載到分區表
load data local inpath
'/home/hadoop/test.txt'
overwrite into table test partition (dt
=
'2017-09-09'
);
1
> partition 是指定這批數據放入分區
2017
-
09
-
09
中;
3
)加載分桶表
-
-
先創建普通臨時表
create table orders_tmp
(
user_id
int
,
user_name string,
create_time string
)
row
format
delimited fields terminated by
','
stored as textfile;
-
-
數據載入臨時表
load data local inpath
'/home/hadoop/lead.txt'
overwrite into table orders_tmp;
-
-
導入分桶表
set
hive.enforce.bucketing
=
true;
insert overwrite table orders select
*
from
orders_tmp;
4
) 導出數據
-
-
導出數據,是將hive表中的數據導出到本地文件中;
insert overwrite local directory
'/home/hadoop/orders.bak2017-12-28'
select
*
from
orders;
【去掉local關鍵字,也可以導出到HDFS上】
5
)插入數據
-
-
insert select ; {}中的內容可選
insert overwrite table order_tmp {partition (dt
=
'2017-09-09'
)} select
*
from
orders;
-
-
一次遍歷多次插入
from
orders
insert overwrite table log1 select company_id,original where company_id
=
'10086'
insert overwrite table log1 select company_id,original where company_id
=
'10000'
[每次hive查詢,都會將數據集整個遍歷一遍,當查詢結果會插入過個表中時,可以采用以上語法,將一次遍歷寫入過個表,以達到提高效率的目的。]
6
)復制表
-
-
復制表是將源表的結構和數據復制並創建為一個新表,復制過程中,可以對數據進行篩選,列可以進行刪減
create table order
row
format
delimited fields terminated by
'\t'
stored as textfile
as
select leader_id,order_id,
'2017-09-09'
as bakdate
from
orders
where create_time <
'2017-09-09'
;
[備份orders表中日期小於
2017
-
09
-
09
的內容到order中,選中了leader_id,order_id,添加了一個bakdate列]
7
)克隆表
-
-
只克隆源表的所有元數據,但是不復制源表的數據
create table orders like order;
8
)備份表
-
-
將orders_log數據備份到HDFS上命名為
/
user
/
hive
/
action_log.export;備份是備份表的元數據和數據內容
export table orders_log partition (dt
=
'2017-09-09'
) to
'/user/hive/action_log.export'
;
9
) 還原表
import
table orders_log
from
'/user/hive/action_log.export'
;
