Hive入門--2.分區表外部分區表關聯查詢

本文轉載自查看原文 2017-07-05 16:16 1343 Hive

1.查看mysql中metastore數據存儲結構

Metastore中只保存了表的描述信息（名字，列，類型，對應目錄）

使用SQLYog連接itcast05 的mysql數據庫
這里寫圖片描述
查看hive數據庫的表結構：

2.建表(默認是內部表（先建表，后有數據）)

（建表時必須指定列的分隔符）

create table trade_detail(
id bigint, 
account string, 
income double, 
expenses double, 
time string) 
row format delimited fields terminated by '\t';

3.Hive狀態下執行Hadoop hdfs命令

在使用hive shell 的時候，我們有時候需要操作hdfs
Hive為我們提供了在hive命令行下hdfs的shell：
，例如：

dfs  -ls  /; 
dfs -mkdir  /data;
dfs  -put  /root/student.txt;

用法和hdfs下是一樣的，只是細微的差別
和Hadoop命令稍微有些差別,前面是dfs開頭，后面以“;”結尾

4.創建–外部表（先有數據，后建表）

先上傳數據文件　a.txt b.txt 到hdfs:/data目錄下，
a.txt 和 b.txt 中的內容都是：
這里寫圖片描述
后執行創建表的命令：

create external table ext_student (
id int,
name string) 
row format delimited fields terminated by '\t' 
location '/data';

創建完成后使用命令：select * from ext_student;　查看表中內容：
這里寫圖片描述

再次上傳數據文件　pep.avi　
這里寫圖片描述

到　hdfs:/data　目錄下，后執行全表掃描：select * from ext_student;
這里寫圖片描述

說明：只要將這個數據放到　hdfs:/data　表所指定的目錄下，hive就能將這個表中的數據讀取出來（內部表和外部表都支持，但也存在特殊情況讀不出）

為什么把文件丟到對應目錄下就能把數據讀出來？

答：因為metastore記錄了這張表和數據的映射關系

SDS表中的內容：
這里寫圖片描述

5.創建–分區表

建分區表是為了提高數據的查詢效率，按照省份、年份、月份等分區

創建一個外部分區表(External Table )：
（表名：beauties 指向文件：beauty）

create external table beauties (
id bigint, 
name string, 
size double) 
partitioned by (nation string) 
row format delimited fields terminated by ‘\t’ 
location ‘/beauty’ ;

show create table beauties;

執行完成之后發現hdfs根目錄下有beauty文件夾。

准備好3個數據文件： b.c b.j b.a
這里寫圖片描述

載入數據文件，同時指定分區：

load data local inpath '/root/b.c' into table beauties partition (nation='China');

查看表中是否成功load數據：
這里寫圖片描述
突發奇想：能否像平常使用外部表一樣，在 hdfs:/beauty 目錄下創建一個文件夾 nation=Japan ,然后將b.j 文件上傳到這個目錄下,數據就可以查出來了？

答：不行！因為在載入數據的時候，metastore是不知道你將這個文件放到 /beauty/nation=Japan/ 目錄下的。

拯救方法：通知hive在元數據庫中添加一個beauties表的分區記錄

alter table beauties add partition (nation=’Japan’) location “/beauty/nation=Japan/”

添加分區后，metastore中SDS表多了一條記錄：

這里寫圖片描述

再次查詢beauties表，發現b.j中的數據也能查詢出來了：

分區表的使用優勢：

select * from beauties where nation=’China’;

在數據量很大的時候，建分區表可以提高查詢效率，就不需要將整張表數據篩選對比之后再輸出，因為數據在hdfs中直接是以分區存儲的，所以使用類似”nation”等分區字段是可以直接把數據取出的

刪除分區：

alter table beauties drop if exists partition (nation ='Japan') ;

注：這里的 if exists 字段呢，是一個檢查分區是否存在的字段，存在則刪除，不存在也不會報錯說分區不存在啦

建內部分區表(Managed Table)

create table td_part(
id bigint, 
account string, 
income double, 
expenses double, 
time string) 
partitioned by (logdate string) 
row format delimited fields terminated by '\t';

普通表和分區表區別：有大量數據增加的需要建分區表

create table book (
id bigint, 
name string) 
partitioned by (pubdate string) 
row format delimited fields terminated by '\t';

分區表加載數據
（hive自己的語法）

load data local inpath './book.txt' 
overwrite into table book 
partition (pubdate='2010-08-22');

local inpath –>從本地磁盤加載，不是hdfs

overwrite –>以覆蓋的方式將數據寫入book表中

以下創建表的方式少了“overwrite”，則是以追加方式將數據加載到hive表中：

load data local inpath '/root/data.am' 
into table beauty 
partition (nation="USA");

使用分區字段查詢表中的數據

select nation, avg(size) from beauties group by nation order by avg(size);

6. 表關聯查詢

查詢舉例：
需求：
　　對 trade_detail 按照賬戶進行分組，求出每個賬戶的總支出總結余，然后和 user_info 進行表關聯，取出名稱。

在mysql中一條查詢語句就能完成關聯查詢：

select t.account,u.name,t.income, t.expenses, t.surplus 
from user_info u join (
    select account,sum(income) as income,sum(expenses) as expenses,sum(income-expenses) as surplus 
    from trade_detail group by account 
) t 
on u.account = t.account

但是數據量一大，這個查詢過程將變得極其漫長

所以我們使用hive來完成：

a) 首先要將2張表中的數據導入hdfs中，同樣，我們也可以將mysql中的數據直接導入到hive表里面：

Mysql中的表：
trade_detail表：
這里寫圖片描述

user_info表：
這里寫圖片描述

b) 在hive中創建表

trade_detail表：

create table trade_detail (
id bigint,
account string,
income string,
expenses string ,
times string) 
row format delimited fields terminated by ‘\t’;

user_info表：

create table user_info (
id int,
account string,
name string,
age int) 
row format delimited fields terminated by ‘\t’;

c) 使用Sqoop 將mysql中trade_detail的數據導入hive中

./sqoop import 
--connect jdbc:mysql://192.168.1.102:3306/itcast 
--username root 
--password 123 
--table trade_detail 
--hive-import 
--hive-overwrite 
--hive-table trade_detail 
--fields-terminated-by '\t';

可能會出現如下的錯誤：

這里寫圖片描述

原因是沒有將hive添加到環境變量：
解決：
１）編輯　/etc/profile　文件，添加HIVE_HOME：
這里寫圖片描述

２）source /etc/profile　刷新配置

３）使用 which 命令查看是否添加成功：
這里寫圖片描述
ok
４）再次執行sqoop命令，發現sqoop導入正在執行，可以看到map-reduce工作正在執行，在web瀏覽器上查看執行完成之后的結果文件：

Sqoop導入執行成功！

d) 使用Sqoop 將mysql中user_info的數據導入hive的user_info中

./sqoop import 
--connect jdbc:mysql://192.168.1. 102:3306/itcast 
--username root 
--password 123 
--table user_info 
--hive-import 
--hive-overwrite 
--hive-table user_info 
--fields-terminated-by '\t';

e) hive執行關聯查詢語句之后的結果：

select t.account,u.name,t.income, t.expenses, t.surplus 
from user_info u join (
    select account,sum(income) as income,sum(expenses) as expenses,sum(income-expenses) as surplus 
    from trade_detail group by account 
) t 
on u.account = t.account;

這里寫圖片描述

經驗證，這樣的查詢結果和在mysql中執行的結果是相同的

來源： http://lib.csdn.net/article/hive/48462

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 HIVE外部表分區表 hive的分區表 Hive管理表，外部表及外部分區表的深入探討 Hive內部表，外部表，分區表的創建 hive內部表、外部表、分區表、視圖 Hive之分區表 hive創建分區表 hive刪除分區表以及修復分區表 Hive靜態分區表&動態分區表 Hive分區表的分區操作

Hive入門--2.分區表 外部分區表 關聯查詢

1.查看mysql中metastore數據存儲結構

2.建表(默認是內部表（先建表，后有數據）)

3.Hive狀態下執行Hadoop hdfs命令

4.創建–外部表（先有數據，后建表）

5.創建–分區表

6. 表關聯查詢

所以我們使用hive來完成：

a) 首先要將2張表中的數據導入hdfs中，同樣，我們也可以將mysql中的數據直接導入到hive表里面：

b) 在hive中創建表

c) 使用Sqoop 將mysql中trade_detail的數據導入hive中

d) 使用Sqoop 將mysql中user_info的數據導入hive的user_info中

e) hive執行關聯查詢語句之后的結果：

經驗證，這樣的查詢結果和在mysql中執行的結果是相同的

免責聲明！

Hive入門--2.分區表外部分區表關聯查詢