HIVE中的分區表是什么，我們先看操作，然后再來體會。

創建一個分區表，分區的單位時dt和國家名
hive> create table logs(ts bigint,line string)
    > partitioned by (dt String,country string);

接下來我們創建要給分區

hive> load data local inpath '/root/hive/partitions/file1' into table logs
    > partition (dt='2001-01-01',country='GB');

上面語句的效果是在hdfs系統上建立了一個層級目錄

　　-logs

　　　　-dt=2001-01-01

　　　　-country=GB

我們繼續執行下面語句，先看一下什么效果
hive>  load data local inpath '/root/hive/partitions/file2' into table logs
    > partition (dt='2001-01-01',country='GB');
Loading data to table default.logs partition (dt=2001-01-01, country=GB)
OK
Time taken: 1.379 seconds
hive>  load data local inpath '/root/hive/partitions/file3' into table logs
    > partition (dt='2001-01-01',country='US');
Loading data to table default.logs partition (dt=2001-01-01, country=US)
OK
Time taken: 1.307 seconds
hive>  load data local inpath '/root/hive/partitions/file4' into table logs
    > partition (dt='2001-01-02',country='GB');
Loading data to table default.logs partition (dt=2001-01-02, country=GB)
OK
Time taken: 1.253 seconds
hive>  load data local inpath '/root/hive/partitions/file5' into table logs
    > partition (dt='2001-01-02',country='US');
Loading data to table default.logs partition (dt=2001-01-02, country=US)
OK
Time taken: 1.07 seconds
hive>  load data local inpath '/root/hive/partitions/file6' into table logs
    > partition (dt='2001-01-02',country='US');
Loading data to table default.logs partition (dt=2001-01-02, country=US)
OK
Time taken: 1.227 seconds

我們到HDFS上查看，發現建立了下面層級目錄

/user/hive/warehouse/logs
├── dt=2001-01-01/
│ ├── country=GB/
│ │ ├── file1
│ │ └── file2
│ └── country=US/
│ └── file3
└── dt=2001-01-02/
├── country=GB/
│ └── file4
└── country=US/
├── file5
└── file6

是加上所有files的內容基本上一樣，藍色的^A是系統默認分隔符。八進制是‘\001’.隨后參考我的另一個文章。比較詳細解釋了分隔符。

總結：分區表的意思，其實想明白了就很簡單。就是在系統上建立文件夾，把分類數據放在不同文件夾下面，加快查詢速度。

關鍵點1：partitioned by (dt String,country string); 創建表格時，指明了這是一個分區表。將建立雙層目錄，第一次目錄的名字和第二層目錄名字規則

PARTITIONED BY子句中定義列，是表中正式的列，成為分區列。但是數據文件中並沒有這些值，僅代表目錄。

關鍵點2： partition (dt='2001-01-01',country='GB'); 上傳數據時，把數據分別上傳到不同分區中。也就是分別放在不同的子目錄下。

理解分區就是文件夾分而治之，查詢的時候可以當作列名來顯示查詢的范圍。

查看分區結構
hive> show partitions logs;
OK
dt=2001-01-01/country=GB
dt=2001-01-01/country=US
dt=2001-01-02/country=GB
dt=2001-01-02/country=US

條件限定了country='GB'目錄所以只有file1,2,4的內容輸出

hive> select ts,dt,line 
    > from logs
    > where country='GB';
OK
1    2001-01-01    Log line 1
2    2001-01-01    Log line 2
4    2001-01-02    Log line 4

現在只查看dt=2001-01-02目錄下country=US的文件夾下的數據。

hive> select ts,dt,line
> from logs
> where dt='2001-01-02'
> and country='US';
OK
5 2001-01-02 Log line 5
6 2001-01-02 Log line 6

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive靜態分區表&動態分區表 hive之建立分區表和分區 HIVE外部表分區表 Hive分區表動態添加字段 Hive分區表創建、分類 Hive分區表創建，增加及刪除 Hive 復制分區表和數據 Hive內部表，外部表，分區表的創建 hive內部表、外部表、分區表、視圖通過表名和時間查看hive分區表的數據