把HDFS上的數據導入到Hive中

本文轉載自查看原文 2018-11-16 17:00 9494 hive

1. 首先下載測試數據，數據也可以創建

http://files.grouplens.org/datasets/movielens/ml-latest-small.zip

2. 數據類型與字段名稱

movies.csv（電影元數據）
movieId,title,genres

ratings.csv（用戶打分數據）
userId,movieId,rating,timestamp

3. 先把數據存放到HDFS上

hdfs dfs -mkdir /hive_operate
hdfs dfs -mkdir /hive_operate/movie_table
hdfs dfs -mkdir /hive_operate/rating_table

hdfs dfs -put movies.csv /hive_operate/movie_table
hdfs dfs -put ratings.csv /hive_operate/rating_table

4. 創建movie_table和rating_table

]$ cat create_movie_table.sql 
create external table movie_table
(
movieId STRING,
title STRING,
genres STRING
)
row format delimited fields terminated by ','
stored as textfile
location '/hive_operate/movie_table';

]$ cat create_rating_table.sql
create external table rating_table
(userId STRING,
movieId STRING,
rating STRING,
ts STRING
)
row format delimited fields terminated by ','
stored as textfile
location '/hive_operate/rating_table';
其中字段名為timestamp為hive的保留字段，執行的時候會報錯，需用反引號或者修改字段名，我這邊修改的字段名

5. 執行

可以通過復制命令到終端執行，也可以通過hive -f movie_table_e來創建表

6. 查看

hive> show tables;
OK
movie_table
rating_table

hive> select * from rating_table limit 10;
OK
1    31    2.5    1260759144
1    1029    3.0    1260759179
1    1061    3.0    1260759182
1    1129    2.0    1260759185
1    1172    4.0    1260759205
1    1263    2.0    1260759151
1    1287    2.0    1260759187
1    1293    2.0    1260759148
1    1339    3.5    1260759125
1    1343    2.0    1260759131

7. 生成新表(行為表)

create table behavior_table as
select B.userid, A.movieid, B.rating, A.title
from movie_table A
join rating_table B
on A.movieid == B.movieid;

8. 把Hive表數據導入到本地

table->local file
insert overwrite local directory '/root/hive_test/1.txt' select * from behavior_table;

9. 把Hive表數據導入到HDFS上

table->hdfs file
insert overwrite directory '/root/hive_test/1.txt' select * from behavior_table;

10. 把本地數據導入到Hive表中

local file -> table
LOAD DATA LOCAL INPATH '/root/hive_test/a.txt' OVERWRITE INTO TABLE behavior_table;

11. 把HDFS上的數導入到HIve表中

hdfs file -> table
LOAD DATA INPATH '/a.txt' OVERWRITE INTO TABLE behavior_table;

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 如何將數據導入到hive中如何將json數據導入到Hive中使用spark將hive中的數據導入到mongodb 將Mongodb的表導入到Hive中將Hive統計分析結果導入到MySQL數據庫表中（一）——Sqoop導入方式 DataX案例：讀取MongoDB的數據導入到HDFS 11.把文本文件的數據導入到Hive表中 sqoop從oracle數據庫抽取數據,導入到hive 將數據從數據倉庫Hive導入到MySQL 將MySQL中數據導入到MongoDB中