DDL(Data Definition Language)數據定義
一、創建數據庫
CREATE DATABASE [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path];
注:Impala不支持WITH DBPROPERTIE…語法,但是在Hive中可以
[bigdata12:21000] > create database db_hive WITH DBPROPERTIES('name' = 'Plus');
Query: create database db_hive
WITH DBPROPERTIES('name' = 'ttt')
ERROR: AnalysisException: Syntax error in line 2:
WITH DBPROPERTIES('name' = 'ttt')
^
Encountered: WITH
Expected: COMMENT, LOCATION
hive> create database db_hive WITH DBPROPERTIES('name' = 'plus');
或者直接通過Impala創建:
[cdh2:21000] >create database db_hive ('name' = 'plus');
二、查詢數據庫
1.顯示數據庫
[cdh:21000] > show databases;
[cdh:21000] > show tables;
Query: show tables
+----------+
| name |
+----------+
| student2 |
+----------+
Fetched 1 row(s) in 0.07s
[bigdata12:21000] > show databases like 'hive*';
這里的like也可以不寫。
Query: show databases like 'hive*'
+---------+---------+
| name | comment |
+---------+---------+
| hive_db | |
+---------+---------+
[bigdata12:21000] > desc database hive_db;
Query: describe database hive_db
+---------+----------+---------+
| name | location | comment |
+---------+----------+---------+
| hive_db | | |
+---------+----------+---------+
2.刪除數據庫
[bigdata12:21000] > drop database hive_db;
[bigdata12:21000] > drop database hive_db cascade;
注:Impala不支持alter database語法,且當數據庫被 USE 語句選中時,無法刪除
三、創建表
1.管理表-內部表
[bigdata12:21000] >create table if not exists student2(
id int, name string
)
row format delimited fields terminated by '\t'
stored as textfile 文件的存儲格式,text是二進制的,不方便查看
location '/user/hive/warehouse/student2'; 存儲的路徑
[bigdata12:21000] > desc formatted student2; 查看表結構的詳細信息
2.外部表
[bigdata12:21000] >create external table stu_external(
id int, name string)
row format delimited fields terminated by '\t'
location '/stu_external'; 外部表可以自定義存儲路徑
這樣這張表就不會在默認數據庫中存儲,存在自定義的路徑下。
四、分區表
1.創建分區表
[bigdata12:21000] >create table stu_par(id int, name string)
partitioned by (month string)
row format delimited
fields terminated by '\t';
分區字段永遠都在最后顯示,像這里的month就是如此。
2.向表中導入數據
[bigdata12:21000] > alter table stu_par add partition (month='201910');
[bigdata12:21000] > load data inpath '/student.txt' into table stu_par partition(month='201910');
[bigdata12:21000] > insert into table stu_par partition (month = '201910') select * from student;
注意:
Impala:如果分區沒有,load data導入數據時,不能自動創建分區。
Hive:如果分區沒有,load data導入數據時,會自動創建分區。
在HDFS上load相當於移動、在Hive上load本地的話就相當於復制進去;HDFS的話相當於剪切。
接下來自己通過Impala將student表中的數據插入到stu_par表中的指定分區
走Hive的話需要運行job,需要走map
3.查詢分區表中的數據
[bigdata12:21000] > select * from stu_par where month = '201911';
4.增加多個分區
[bigdata12:21000] > alter table stu_par add partition (month='201812') partition (month='201813');
增加多個分區的話分區之間要用空格隔開。
5.刪除分區
[bigdata12:21000] > alter table stu_par drop partition (month='201812');
刪除多個分區的話分區之間要用逗號隔開。
6.查看分區
[bigdata12:21000] > show partitions stu_par;
五、創建視圖
#創建視圖
create view if not exists stu_view
as select name from student;
#展示視圖
show tables;
#查詢視圖
select * from stu_view;
#更改視圖
alter view stu_view as select id from student;
#刪除視圖
drop view stu_view;
六、Impala常用SQL
1.insert 語句
#創建表
create table person(id int ,name string, age int);
#插入數據
insert into person values(1,'A',18);
insert into person values(1,'A_1',20);
insert into person values(2,'B',29);
insert into person values(3,'C',16);
insert into person values(4,'D',40);
Impala執行每條insert語句通常只需要零點幾秒,
2.ORDER BY 語句
select * from person order by age desc; 倒序的
注意:Impala不支持區內排序,只支持全局排序,因為Impala不走MR。
3.GROUP BY 語句
insert into person values(1,'A',21);
select name,sum(age) from person group by name;
對name進行分組,求每組的年齡和
4.Having 語句
select name,sum(age) from person
group by name having sum(age) >30;
對name進行分組,求每組的年齡和,並且只篩選出年齡和大於30的
5.Limit 語句
select * from person order by id limit 3;
按照id排序,只選取前三行的數據
6.offset 語句
select * from person order by id limit 3 offset 1;
Offset語句在Hive是沒有的
7.union 語句
select * from stu_view union select name from person;
UNION 操作符 合並兩個或多個 SELECT 語句的結果。