表格創建:
語法
第一種建表的形式:
說明:
temporary 臨時表,在當前回話內,這張表有效,當回話結束,可以理解為程序結束,則程序終止。
external 外部表, hdfs 上的表的文件,並非存儲在默認的路徑上的時候,
EXTERNAL 表格和正常表格刪除區別,external 只刪除metastore
可以稱為外部表,便於和其他數據庫和程序交互,比如impala 等。
如果不加 IF NOT EXISTS 的時候,如果表存在,會報錯,可以加上IF NOT EXISTS 加以避免。
注意表名不區分大小寫
例子:
create temporary table my.table1;
create external table my.table2;
create tabel if not exists my.table3;
-- (Note: TEMPORARY available in Hive 0.14.0 and later)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
--定義列, 比如 id Int comment '索引', name string comment '名字'
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment] -- comment 表示表的注釋
--分區,括號內的定義類似列的定義,分區可以根據默寫字段比如日期,城市,進行分區,可以加快某些條件下的查詢
--部分列的集合,根據分區列的進行粗粒度的划分,一個分區,代表着一個目錄
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
--分桶,在分區的基礎上,可以進行分桶,分桶的原理是,根據某幾列進行計算hash 值,
--然后hash 值對分成的桶的個數取余操作,決定放在哪個桶里面
--在數據量足夠大的情況下,分桶比分區,更高的查詢效率
--分桶,還可以使抽樣更加高效
[CLUSTERED BY (col_name, col_name, ...)
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] ---- 分桶
---大致上Skewed,對數據傾斜處理有很大幫助,沒用過
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
] -- 表示文件的存儲格式, 其中store by 指的是自定義文件格式,用得不多,筆者沒有用過。
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- 表示表格的附加屬性和表述。
-- (Note: Available in Hive 0.6.0 and later)
[AS select_statement];
-- 建立表格的時候同時從其他表格select 數據進行填充表格。
-- (Note: as select_statement Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
說明:
數據類型
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)
基本數據類型
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)
復雜數據類型
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
## 在hdfs 上的文件存儲格式
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
constraint_specification:
: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE
說明
上述的建表語法,有些語法筆者不是很懂,希望各位不吝賜教。
常見例子:
例子一
create table my.tabelDemo(
id int,
name string,
hobby array<string>,
add map<String,string>,
)
row format delimited
fields terminated by ','
collection items terminated by '-'
map keys terminated by ':'
store as textfile;
每一列之間,使用逗號分隔,
array 內部的string 使用-分隔。
map 的key 和value, 使用冒號分隔 :
例子二
-- 文件存儲形式是parquet
CREATE EXTERNAL TABLE IF NOT EXISTS default.person_table(
ftpurl string,
ipcid string,
feature array<float>,
eyeglasses int,
gender int,
haircolor int,
hairstyle int,
hat int,
huzi int,
tie int,
timeslot int,
exacttime Timestamp,
searchtype string,
sharpness int
)
partitioned by (date string)
STORED AS PARQUET
LOCATION '/user/hive/warehouse/person_table';
struct 使用
create table student_test(id INT, info struct<name:STRING, age:INT>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ':';
hdfs 中的文件數據格式大致是:即(struct 里面對應的分隔符是 collection items terminated by 指定的分隔符)
1,zhou:30
2,yan:30
3,chen:20
4,li:80
以下是truncate 用來進行表格的清空
一個有用的數據清空工具
TRUNCATE TABLE table_name [PARTITION partition_spec];
partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)
刪除表格
DROP TABLE [IF EXISTS] table_name [PURGE];
-- purge,如果配置了垃圾回收,而drop table 時 加上了purge,則其會被徹底刪除,在垃圾箱中也找不回來。
修改表
重命名表
ALTER TABLE table_name RENAME TO new_table_name;
改變表格屬性
ALTER TABLE table_name SET TBLPROPERTIES table_properties;
table_properties:
: (property_name = property_value, property_name = property_value, ... )
改變表格評論
ALTER TABLE table_name SET TBLPROPERTIES ('comment' = new_comment);
對表格進行分桶
ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name, ...)]
INTO num_buckets BUCKETS;
添加分區
ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location']
[, PARTITION partition_spec [LOCATION 'location'], ...];
partition_spec:
: (partition_column = partition_col_value, partition_column = partition_col_value, ...)
重命名分區
ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_spec;
刪除分區
ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION partition_spec, ...]
[IGNORE PROTECTION] [PURGE];
-- (Note: PURGE available in Hive 1.2.0 and later, IGNORE PROTECTION not available 2.0.0 and later)
視圖創建
CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name [COMMENT column_comment], ...) ]
[COMMENT view_comment]
[TBLPROPERTIES (property_name = property_value, ...)]
AS SELECT ...;
原文參考:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL