一. DDL操作 (數據定義語言)
具體參見:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
其實就是我們在創建表的時候用到的一些sql,比如說:CREATE、ALTER、DROP等。DDL主要是用在定義或改變表的結構,數據類型,表之間的鏈接和約束等初始化工作上
1 、創建/ 刪除/ 修改/使用數據庫
1.1創建數據庫
首先啟動:
啟動集群: service iptables stop zkServer.sh start start-all.sh 啟動hive: node02(服務端): hive --service metastore node03(客戶端):hive
①創建:
hive> create database lisi; OK Time taken: 5.271 seconds hive> show databases; OK ceshi default lisi mgh shanghai Time taken: 0.059 seconds, Fetched: 5 row(s)
②hdfs中查看:hdfs:///user/hive/warehouse
1.2 刪除數據庫:
①命令:drop lisi;
hive> drop database lisi; OK Time taken: 0.979 seconds hive> show databases; OK ceshi default mgh shanghai Time taken: 0.082 seconds, Fetched: 4 row(s)
②刷新hdfs:查看結果(此時lisi.db刪除!)
1.3 修改數據庫
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive 0.14.0) ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0) ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)
1.4 使用數據庫:use 數據庫名稱 (use lisi;)
2.創建/刪除表
2.1 創建表
①常見表的類型:數據類型:
data_type
primitive_type 原始數據類型 | array_type 數組 | map_type map 映射類型 | struct_type 結構類型 | union_type -- (Note: Available in Hive 0.7.0 and later)
primitive_type | TINYINT 非常小的整數(SQL Server數據庫的一種數據類型,范圍從0到255之間的整數) | SMALLINT 短整型 | INT | BIGINT | BOOLEAN | FLOAT | DOUBLE | DOUBLE PRECISION
|STRING 基本可以搞定一切 | BINARY 二進制 | TIMESTAMP 時間戳 | DECIMAL 小數 | DECIMAL(precision, scale) | DATE | VARCHAR 變長字符型 | CHAR array_type : ARRAY < data_type > map_type : MAP < primitive_type, data_type > struct_type : STRUCT < col_name : data_type [COMMENT col_comment], ...> union_type : UNIONTYPE < data_type, data_type, ... >
②創建:在數據庫ceshi.db中創建表:test01 ------>建議:在建表前,可以先把要建的表的信息創建好,那么根據表中所涉及的字段信息,來設計即將要創建的表:也就是根據
表內容對應設計字段類型,比如id: int等 ,后面關於row_format直接copy官網即可!
hive> create table test01( > id int, > name string, > age int, > likes array<string>, > address map<string,string>
> ) > row format delimited fields terminated by ','
> COLLECTION ITEMS TERMINATED by '-'
> map keys terminated by ':'
> lines terminated by '\n'; OK Time taken: 1.091 seconds hive> show tables; OK abc test01
②hdfs中查看:
③ 加載數據:在本地目錄中加載
a:在root目錄下新建hivedata文件夾:mkdir hivedata
b:在hivedata目錄中hivedata新建文件: vim hivedata
[root@node03 hivedata]# vim hivedata
1,zshang,18,game-girl-book,stu_addr:beijing-work_addr:shanghai 2,lishi,16,shop-boy-book,stu_addr:hunan-work_addr:shanghai 3,wang2mazi,20,fangniu-eat,stu_addr:shanghai-work_addr:tianjing 4,zhangsna,23,girl-boy-game,stu_addr:songjiang-work_addr:beijing 5,lisi,65,sleep-girl,stu_addr:nanjing-work_addr:anhui 6.wanggu,45,sleep-girl,stu_addr:nanzhou-work_addr:hubei
ok!
c:加載: load data local inpath '/root/hivedata/hivedata' into table test01 /(如果是hsfs上加載: load data inpath 'hdfs://user/ceshi.db/hivedata' ('hdfs實際路徑為准') into table test01 加載方法類似! )
hive> load data local inpath '/root/hivedata/hivedata' into table test01; Loading data to table ceshi.test01 Table ceshi.test01 stats: [numFiles=1, totalSize=363] OK Time taken: 2.711 seconds
d:查看結果:
hive> select * from test01; OK 1 zshang 18 ["game","girl","book"] {"stu_addr":"beijing","work_addr":"shanghai"} 2 lishi 16 ["shop","boy","book"] {"stu_addr":"hunan","work_addr":"shanghai"} 3 wang2mazi 20 ["fangniu","eat"] {"stu_addr":"shanghai","work_addr":"tianjing"} 4 zhangsna 23 ["girl","boy","game"] {"stu_addr":"songjiang","work_addr":"beijing"} 5 lisi 65 ["sleep","girl"] {"stu_addr":"nanjing","work_addr":"anhui"} NULL 45 NULL ["stu_addr:nanzhou","work_addr:hubei"] NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL Time taken: 0.99 seconds, Fetched: 8 row(s)
2.2 刪除表: drop table 表名 (工作中慎用!)
2.3 修改表:
將abc更名為aaa:
hive> alter table abc rename to aaa; OK Time taken: 0.833 seconds hive> show tables; OK aaa test01 Time taken: 0.101 seconds, Fetched: 2 row(s)
2.4 更新/刪除數據:本機現在安裝的是hive1.2.1 不支持hive中的行級的更新/插入/刪除,需要配置hive-site.xml文件
① UPDATE tablename SET column = value [, column =value ...] [WHERE expression]
② DELETE FROM tablename [WHERE expression]
參考博客:https://blog.csdn.net/wzy0623/article/details/51483674
二. DML操作(數據操作語言)
Hive 不能很好的支持用 insert 語句一條一條的進行插入操作,不支持 update 操作。數據是以 load 的方式加載到建立好的表中。數據一旦導入就不可以修改。
1.插入/導入數據:
方法一: insert overwrite table ps1 select id,name,age,likes,address from test01;
先創建新表ps1-->再插入數據:
①按照test01的表結構創建一張空表ps1: create table ps1 like test01; ②插入: insert overwrite table ps1 select id,name,age,likes,address from test01;
方法二: LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
例如:①導入到分區表中
load data local inpath '/root/hivedata/pt01' into table eee partition(part='2018-08-09 09:04);
②在數據庫mgh。db中創建表test01,並導入數據:
hive> create table test01( > name string, > age int, > school string, > project string) > row format delimited fields terminated by ','; hive> desc test01; OK name string age int school string project string hive> load data local inpath '/root/hivedata/pt01' into table test01; hive> select * from test01; OK zhangsan 25 sxt java lisi 23 bj python wangwu 31 dalei php zhousi 27 laonanhai web
方法三:(基本同DDL中導入操作類似)
FROM person t1 INSERT OVERWRITE TABLE person1 [PARTITION(dt='2008-06-08', country)] SELECT t1.id, t1.name, t1.age ;
例如:在mgh.db數據庫中創建表test02,它的數據來源從表test01中插入:
hive> from test01 > insert overwrite table test02 > select name,age,school,project;
這里也可以插入多張表!
或者:
insert overwrite table test02 select name,age,school,project from test01;
2.查詢並保存
2.1保存本地:將數據庫mgh.db中的表test01保存到/root/hivedata中:
insert overwrite local directory '/root/hivedata' row format delimited fields terminated by ',' select * from test01; 類比:將本地目錄的文件加載到表: load data local inpath '/root/hivedata/test01' into table test02;
2.2 保存到hdfs:
insert overwrite directory '/user/hive/warehouse/hive_exp_emp' select * from test01; (hive_exp_emp 單獨創建該目錄用來存放上傳的文件)
注:這里包括上面的插入操作涉及到MapReduce運行,因為本機性能及磁盤內存限制,導致結果無法跑出來,后來重啟集群重新跑數據才成功,最佳的解決方式:為虛擬機配置內存空間,但受本機實際內存限制
暫無法實現!
2.3 備份還原數據
1.備份:將mgh.db中的表test01 備份到數據庫hehe.db
hive> export table test01 to '/user/hive/warehouse/hehe.db' ;
2.刪除:將mgh.db中的test01表刪除,再進行恢復
hive> drop table test01; OK Time taken: 2.271 seconds hive> show tables; OK abc test02 test03 hive> import from '/user/hive/warehouse/hehe.db'; hive> show tables; OK abc test01 test02 test03
三.分區操作
1.創建分區
1.1單分區:在mgh.db中創建帶有單分區的表---test03
①創建空表
create table test03( > name string, > age int, > likes string) partitioned by(part string) > row format delimited fields terminated by ','; OK
②新建表:vim part01
zhang,12,sing lisi,23,drink wanger,34,swim zhousi,23,eat
③導入數據並查看分區:
hive> load data local inpath '/root/hivedata/part01' into table test03 partition(part='2018-08-09 16:02' );
hive> select *from test03; OK zhang 12 sing 2018-08-09 16:02 lisi 23 drink 2018-08-09 16:02 wanger 34 swim 2018-08-09 16:02 zhousi 23 eat 2018-08-09 16:02
1.2創建雙分區表--test04
操作同test03類似:
hive> create table test04( > name string, > age int, > likes string) partitioned by(year string,month string) > row format delimited fields terminated by ','; OK hive> show tables; OK abc test01 test02 test03 test04 hive> load data local inpath '/root/hivedata/part01' into table test04 partition(year='2018' ,month='08-08' ); hive> select * from test04; OK zhang 12 sing 2018 08-08 lisi 23 drink 2018 08-08 wanger 34 swim 2018 08-08 zhousi 23 eat 2018 08-08
1.3 添加分區
這里添加分區不是添加分區字段,而是在原有分區字段的基礎上添加新的值(內容)!
例如:ceshi.db數據庫中,找到之前已建立的分區表---fenqu,它的分區字段是dt(='20180808')
現在在已有的字段dt基礎再添加新的值:'2018-08-09',操作如下:
hive>alter table fenqu add partition(dt='2018-08-09'); 查看分區: hive> show partitions fenqu; OK dt=2018-08-09 dt=20180808
再去hdfs集群查看:
如果多分區怎么添加:
hive> alter table fenqu add partition(dt='123456') partition(dt='8888'); OK Time taken: 0.51 seconds
hive> show partitions fenqu;
OK
dt=112233
dt=123456
dt=2018-08-09
dt=20180808
dt=234567
dt=8888
這里多分區的添加格式:
alter table fenqu add partition(dt='xxxx') partition(dt='xxxx')....;
1.4 刪除分區
現在將之前亂七八糟建的分區統統刪除,如下:
hive> alter table fenqu drop partition(dt='112233'), partition(dt='123456'), partition(dt='234567'), partition(dt='8888'); Dropped the partition dt=112233 Dropped the partition dt=123456 Dropped the partition dt=234567 Dropped the partition dt=8888 OK Time taken: 4.033 seconds hive> show partitions fenqu; OK dt=2018-08-09 dt=20180808 Time taken: 0.396 seconds, Fetched: 2 row(s) 這里刪除的格式: alter table fenqu drop partition(dt='xxx'), partition(dt='xxx'), partition(dt='xxx'), partition(dt='xxx').....;
1.5 加載數據到分區:
例如:將/root/hivedata/test01 表的數據,加載到數據庫ceshi.db中的fenqu表,如下:
hive> load data local inpath '/root/hivedata/test01' into table fenqu partition(dt='2018-08-09', dt='20180808'); Loading data to table ceshi.fenqu partition (dt=2018-08-09) Partition ceshi.fenqu{dt=2018-08-09} stats: [numFiles=1, totalSize=83] OK Time taken: 8.112 seconds hive> select *from fenqu; OK NULL 25 2018-08-09 NULL 23 2018-08-09 NULL 31 2018-08-09 NULL 27 2018-08-09 1 lier 20180808 2 wanger 20180808 3 zhanger 20180808 4 zhouer 20180808 5 qier 20180808 Time taken: 0.394 seconds, Fetched: 9 row(s)
1.6 重命名分區:
hive> alter table fenqu partition(dt='2018-08-10',dt='20180808') rename to partition(dt='2018-08-10 00:05',dt='20180810 12:05'); OK Time taken: 1.158 seconds hive> show partitions fenqu; OK dt=2018-08-10 dt=2018-08-10 00%3A05 Time taken: 0.297 seconds, Fetched: 2 row(s)
1.7 動態分區
流程:①在/root/hivedata/創建一個數據文件aaa.txt (方便導入到后面的表中)
② 設置相關參數:hive.exec.dynamic.partition(默認關閉動態分區:false,需要設為true) /hive.exec.dynamic.partition.mode(默認分區模式:strict,需要設置為:nonstrict)
/hive.exec.max.dynamic.partitions.pernode(默認每個節點最大分區數為100,需要根據實際需要來調整) /hive.exec.max.dynamic.partitions(在所有節點最大默認分區數:1000,需要根據實際來調整,不然會報錯!)
hive.exec.max.created.files(在整個mr任務中默認可以創建10萬個hdfs文件,也可以調整一般足夠了) /hive.error.on.empty.partition(當生成空文件時,默認不拋出錯誤)
③在數據庫ceshi.db中創建一張非分區表---test05 ,同時再創建另一張外表(external table) -----test06並將test05的表內容導入test06.
具體:
vim aaa.txt
aa,US,CA
aaa,US,CB
bbb,CA,BB
bbb,CA,BC
------------------------參數設置--------------------------
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=100;
set hive.exec.max.dynamic.partitions=1000;
set hive.exec.max.created.files=100000;
set hive.error.on.empty.partition=flase;
-------------------------------創建非分區表test05---------------------------------- create table test05(name string,cty string,st string)row format delimited fields terminated by ',';
---------------------------------創建外表test06--------------------------------
create external table test06(name string)partitioned by(country string,state string);
----------------------------從表test05加載數據到test06-------------------------------
insert table test06 partition(country,state) select name,cty,st from test05;
-----------------------------------檢查分區-------------
hive> show partitions test06;
OK
country=CA/state=BB
country=CA/state=BC
country=US/state=CA
country=US/state=CB
注:也可以去集群里查看!