Hive分區與桶表

本文轉載自查看原文 2016-06-11 15:23 3250 hadoop

1、分區

在hive中使用select查詢一般會掃描整個表的內容，從而降低降低查詢的效率。引入分區的概念，使得查詢時只掃描表中關心的部分數據。

一個表中可以有一個或多個分區，每個分區以文件夾的形式單獨存在表文件夾的目錄下。

1.1分區建表分為單分區和雙分區建表:

單分區建表語句：create table sample_table (id int, value string) partitioned by (age int) row format delimited fields terminated by ',' stored as textfile;;表中有id,value,age三列,以age分區

雙分區建表語句：create table sample_table (id int, value string) partitioned by (age int, sex string) row format delimited fields terminated by ',' stored as textfile;;表中有id,value,age,sex四列，按照age和sex分區

【注：set hive.cli.print.current.db=true查看當前是什么數據庫

row format delimited通過新的行將記錄分開

fields terminated by ','各列之間以逗號隔開

stored as textfile存儲為一個文本文件】

1.2添加數據：

load data local inpath ‘路徑’ overwrite into table 表名 partition (分區名=’某值’)

【注：overwrite意味着表中原來的數據會被刪除】

2、桶（Bucket）

分桶其實就是把大表化成了“小表”，然后 Map-Side Join 解決之，這是用來解決大表與小表之間的連接問題。將桶中的數據按某列進行排序會提高查詢效率。

2.1創建帶桶的table：

Create table 表名(id int,name string) clustered by (id) sorted by(name) into 4 buckets row format delimited fields terminated by '\t' stored as textfile; ;

2.2設置環境變量：

set hive.enforce.bucketing = true，使得Hive 就知道用表定義中聲明的數量來創建桶

2.3插入數據：

insert table 桶表名 select * from 表名;

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive SQL之分區表與分桶表 Hive分區表分桶表的認識與區別 Hive 分區和分桶 Hive 桶的分區 Hive分區和桶的概念 Hive 表操作（HIVE的數據存儲、數據庫、表、分區、分桶） Hive之分桶表 Hive的分桶表 Hive動態分區和分桶（八） Hive 實戰(2)--hive分區分桶實戰