Hive中的數據分區

本文轉載自查看原文 2012-08-10 12:02 5566 服務器linux/unix/window

首先認識什么是分區

Hive 中的分區就是分目錄，把一個大的數據集根據業務需要分割成更下的數據集。

1. 如何定義分區，創建分區

hive> create table test(name string,sex int) partitioned by (birth string, age string);
Time taken: 0.044 seconds

hive> alter table test add partition (birth='1980', age ='30');

Time taken: 0.079 seconds

hive> alter table test add partition (birth='1981', age ='29');

Time taken: 0.052 seconds

hive> alter table test add partition (birth='1982', age ='28');

Time taken: 0.056 seconds

hive> show partitions test;
birth=1980/age =30

birth=1981/age =29

birth=1982/age =28

2. 如何刪除分區

hive> alter table test drop partition (birth='1980',age='30');

3. 加載數據到指定分區

load data local inpath '/home/hadoop/data.log' overwrite into table

test partition(birth='1980-01-01',age='30');

創建分區原則：最少粒度原則

4 向partition_test的分區中插入數據：

hive> insert overwrite table partition_test partition(stat_date='20110728',province='henan') select member_id,name from partition_test_input where stat_date='20110728' and province='henan';

5 還可以同時向多個分區插入數據，0.7版本以后不存在的分區會自動創建，0.6之前的版本官方文檔上說必須要預先創建好分區：
hive>
> from partition_test_input
> insert overwrite table partition_test partition (stat_date='20110526',province='liaoning')
> select member_id,name where stat_date='20110526' and province='liaoning'
> insert overwrite table partition_test partition (stat_date='20110728',province='sichuan')
> select member_id,name where stat_date='20110728' and province='sichuan'
> insert overwrite table partition_test partition (stat_date='20110728',province='heilongjiang')
> select member_id,name where stat_date='20110728' and province='heilongjiang';
Total MapReduce jobs = 4

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 hive 刪除分區數據 shell腳本中向hive動態分區插入數據 hive從查詢中獲取數據插入到表或動態分區 sqoop導oracle數據到hive中並動態分區 hive從查詢中獲取數據插入到表或動態分區 Talend 將Oracle中數據導入到hive中,根據系統時間設置hive分區字段 hive之insert導入分區數據 Hive中靜態分區和動態分區總結 Hive查詢分區元數據，PARTITIONED BY hive 一次更新多個分區的數據