hive 數據導入

本文轉載自查看原文 2017-11-10 22:16 3320 hive

Hive的幾種常見的數據導入方式
這里介紹四種：
（1）、從本地文件系統中導入數據到Hive表；
（2）、從HDFS上導入數據到Hive表；
（3）、從別的表中查詢出相應的數據並導入到Hive表中；
（4）、在創建表的時候通過從別的表中查詢出相應的記錄並插入到所創建的表中。

一、從本地文件系統中導入數據到Hive表

先在Hive里面創建好表，如下：

hive> create table wyp
> (id int, name string,
> age int, tel string)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE;
OK
Time taken: 2.832 seconds

本地文件系統里面有個/home/wyp/wyp.txt文件，內容如下：

[wyp@master ~]$ cat wyp.txt
1 wyp 25 13188888888888
2 test 30 13888888888888
3 zs 34 899314121

wyp.txt文件中的數據列之間是使用\t分割的，可以通過下面的語句將這個文件里面的數據導入到wyp表里面，操作如下：

hive> load data local inpath 'wyp.txt' into table wyp;
Copying data from file:/home/wyp/wyp.txt
Copying file: file:/home/wyp/wyp.txt
Loading data to table default.wyp
Table default.wyp stats:
[num_partitions: 0, num_files: 1, num_rows: 0, total_size: 67]
OK
Time taken: 5.967 seconds

可以到wyp表的數據目錄下查看，如下命令：

hive> dfs -ls /user/hive/warehouse/wyp ;
Found 1 items
-rw-r--r--3 wyp supergroup 67 2014-02-19 18:23 /hive/warehouse/wyp/wyp.txt

需要注意的是： Hive並不支持INSERT INTO …. VALUES形式的語句。

二、HDFS上導入數據到Hive表

　　從本地文件系統中將數據導入到Hive表的過程中，其實是先將數據臨時復制到HDFS的一個目錄下（典型的情況是復制到上傳用戶的HDFS home目錄下,比如/home/wyp/），然后再將數據從那個臨時目錄下移動（注意，這里說的是移動，不是復制！）到對應的Hive表的數據目錄里面。既然如此，那么Hive肯定支持將數據直接從HDFS上的一個目錄移動到相應Hive表的數據目錄下，假設有下面這個文件/home/wyp/add.txt，具體的操作如下：

[wyp@master /home/q/hadoop-2.2.0]$ bin/hadoop fs -cat /home/wyp/add.txt
5 wyp1 23 131212121212
6 wyp2 24 134535353535
7 wyp3 25 132453535353
8 wyp4 26 154243434355

這個文件是存放在HDFS上/home/wyp目錄（和一中提到的不同，一中提到的文件是存放在本地文件系統上）里面，我們可以通過下面的命令將這個文件里面的內容導入到Hive表中，具體操作如下：

hive> load data inpath '/home/wyp/add.txt' into table wyp;
Loading data to table default.wyp
Table default.wyp stats:
[num_partitions: 0, num_files: 2, num_rows: 0, total_size: 215]
OK
Time taken: 0.47 seconds
hive> select * from wyp;
OK
5 wyp1 23 131212121212
6 wyp2 24 134535353535
7 wyp3 25 132453535353
8 wyp4 26 154243434355
1 wyp 25 13188888888888
2 test 30 13888888888888
3 zs 34 899314121
Time taken: 0.096 seconds, Fetched: 7 row(s)

從上面的執行結果我們可以看到，數據的確導入到wyp表中了！請注意load data inpath ‘/home/wyp/add.txt’ into table wyp; 里面是沒有local這個單詞的，這個是和一中的區別。

三、從別的表中查詢出相應的數據並導入到Hive表中

假設Hive中有test表，其建表語句如下所示：

hive> create table test(
> id int, name string
> ,tel string)
> partitioned by
> (age int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> STORED AS TEXTFILE;
OK
Time taken: 0.261 seconds

大體和wyp表的建表語句類似，只不過test表里面用age作為了分區字段。對於分區，這里在做解釋一下：

分區：在Hive中，表的每一個分區對應表下的相應目錄，所有分區的數據都是存儲在對應的目錄中。比如wyp表有dt和city兩個分區，則對應dt=20131218,city=BJ對應表的目錄為/user/hive/warehouse/dt=20131218/city=BJ，所有屬於這個分區的數據都存放在這個目錄中。

下面語句就是將wyp表中的查詢結果並插入到test表中：

hive> insert into table test
> partition (age='25')
> select id, name, tel
> from wyp;

通過上面的輸出，我們可以看到從wyp表中查詢出來的東西已經成功插入到test表中去了！如果目標表（test）中不存在分區字段，可以去掉partition (age=’25′)語句。當然，我們也可以在select語句里面通過使用分區值來動態指明分區：

hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive> insert into table test
> partition (age)
> select id, name,
> tel, age
> from wyp;

當然，Hive也支持insert overwrite方式來插入數據，執行完這條語句的時候，相應數據目錄下的數據將會被覆蓋！而insert into則不會，注意兩者之間的區別。例子如下：

hive> insert overwrite table test
> PARTITION (age)
> select id, name, tel, age
> from wyp;

Hive還支持多表插入

hive> show create table test3;
OK
CREATE TABLE test3(
id int,
name string)
Time taken: 0.277 seconds, Fetched: 18 row(s)
hive> from wyp
> insert into table test
> partition(age)
> select id, name, tel, age
> insert into table test3
> select id, name
> where age>25;

可以在同一個查詢中使用多個insert子句，這樣的好處是我們只需要掃描一遍源表就可以生成多個不相交的輸出

四、在創建表的時候通過從別的表中查詢出相應的記錄並插入到所創建的表中

在實際情況中，表的輸出結果可能太多，不適於顯示在控制台上，這時候，將Hive的查詢輸出結果直接存在一個新的表中是非常方便的，我們稱這種情況為CTAS（create table .. as select）如下：

hive> create table test4
> as
> select id, name, tel
> from wyp;

數據就插入到test4表中去了，CTAS操作是原子的，因此如果select查詢由於某種原因而失敗，新表是不會創建的！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 數據導入(一):Hive On HBase Hive數據導入Elasticsearch 導入HDFS的數據到Hive sqoop導入數據到hive Hive數據導入Hbase Hive數據導入Elasticsearch Hive數據導入Hbase 042 將數據導入hive，將數據從hive導出 Hive 將本地數據導入hive表中 Hive如何加載和導入HBase的數據