【Hive學習之二】Hive SQL


環境
  虛擬機:VMware 10
  Linux版本:CentOS-6.5-x86_64
  客戶端:Xshell4
  FTP:Xftp4
  jdk8
  hadoop-3.1.1
  apache-hive-3.1.1

 

參考:官網hive操作手冊

一、DDL

1、數據類型

data_type
  : primitive_type
  | array_type
  | map_type
  | struct_type
  | union_type  -- (Note: Available in Hive 0.7.0 and later)
 
primitive_type
  : TINYINT
  | SMALLINT
  | INT
  | BIGINT
  | BOOLEAN
  | FLOAT
  | DOUBLE
  | DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
  | STRING
  | BINARY      -- (Note: Available in Hive 0.8.0 and later)
  | TIMESTAMP   -- (Note: Available in Hive 0.8.0 and later)
  | DECIMAL     -- (Note: Available in Hive 0.11.0 and later)
  | DECIMAL(precision, scale)  -- (Note: Available in Hive 0.13.0 and later)
  | DATE        -- (Note: Available in Hive 0.12.0 and later)
  | VARCHAR     -- (Note: Available in Hive 0.12.0 and later)
  | CHAR        -- (Note: Available in Hive 0.13.0 and later)
 
array_type
  : ARRAY < data_type >
 
map_type
  : MAP < primitive_type, data_type >
 
struct_type
  : STRUCT < col_name : data_type [COMMENT col_comment], ...>
 
union_type
   : UNIONTYPE < data_type, data_type, ... >  -- (Note: Available in Hive 0.7.0 and later)

 

2、數據庫的創建、刪除、修改;

3、表的創建、刪除、修改;

舉例:創建表

hive>CREATE TABLE person(
id INT,
name STRING,
age INT,
likes ARRAY<STRING>,
address MAP<STRING,STRING>
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
COLLECTION ITEMS TERMINATED BY '-'
MAP KEYS TERMINATED BY ':' 
LINES TERMINATED BY '\n';

查看表結構:

hive> desc person;
OK
id                      int                                         
name                    string                                      
age                     int                                         
likes                   array<string>                               
address                 map<string,string>                          
Time taken: 0.095 seconds, Fetched: 5 row(s)
hive> desc formatted person;
OK
# col_name                data_type               comment             
id                      int                                         
name                    string                                      
age                     int                                         
likes                   array<string>                               
address                 map<string,string>                          
          
# Detailed Table Information          
Database:               default                  
OwnerType:              USER                     
Owner:                  root                     
CreateTime:             Tue Jan 29 11:41:12 CST 2019     
LastAccessTime:         UNKNOWN                  
Retention:              0                        
Location:               hdfs://PCS102:9820/root/hive_remote/warehouse/person     
Table Type:             MANAGED_TABLE            
Table Parameters:          
    COLUMN_STATS_ACCURATE    {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"address\":\"true\",\"age\":\"true\",\"id\":\"true\",\"likes\":\"true\",\"name\":\"true\"}}
    bucketing_version       2                   
    numFiles                0                   
    numRows                 0                   
    rawDataSize             0                   
    totalSize               0                   
    transient_lastDdlTime    1548733272          
          
# Storage Information          
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat     
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:          
    collection.delim - field.delim , line.delim \n mapkey.delim : serialization.format ,                   
Time taken: 0.157 seconds, Fetched: 39 row(s)

向表內加載數據:

data:
1,小明1,18,lol-book-movie,beijing:shangxuetang-shanghai:pudong
2,小明2,20,lol-book-movie,beijing:shangxuetang-shanghai:pudong
3,小明3,21,lol-book-movie,beijing:shangxuetang-shanghai:pudong
4,小明4,21,lol-book-movie,beijing:shangxuetang-shanghai:pudong
5,小明5,21,lol-book-movie,beijing:shangxuetang-shanghai:pudong
6,小明6,21,lol-book-movie,beijing:shangxuetang-shanghai:pudong
hive> LOAD DATA LOCAL INPATH '/root/data' INTO TABLE person;
Loading data to table default.person
OK
Time taken: 0.185 seconds
hive> select * from person;
OK
1    小明1    18    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}
2    小明2    20    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}
3    小明3    21    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}
4    小明4    21    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}
5    小明5    21    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}
6    小明6    21    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}
Time taken: 0.126 seconds, Fetched: 6 row(s)
hive> 

備注:向表導入數據最好按照表定義的結構來安排數據,如果不按照這個格式,文件也能上傳到HDFS,這是通過hive select查看的時候查不出來,無法格式化輸出。

 struct類型:

數據  /root/data:

1,xiaoming:12
2,xiaohong:11

建表 從linux本地文件系統導入數據:

hive> create table student(
    > id int,
    > info STRUCT <name:string,age:int>
    > )
    > ROW FORMAT DELIMITED 
    > FIELDS TERMINATED BY ',' 
    > COLLECTION ITEMS TERMINATED BY ':'
    > ;
OK
Time taken: 0.712 seconds
hive> show tables;
OK
logtbl
person
person3
psn2
psn3
psn4
student
test01
Time taken: 0.1 seconds, Fetched: 8 row(s)
hive> load data local inpath '/root/data' into table student;
Loading data to table default.student
OK
Time taken: 0.365 seconds
hive> select * from student;
OK
1    {"name":"xiaoming","age":12}
2    {"name":"xiaohong","age":11}
Time taken: 1.601 seconds, Fetched: 2 row(s)
hive> 

 對比從hdfs導入數據:

先上傳文件到hdfs  根目錄:

[root@PCS102 ~]# hdfs dfs -put data /
[root@PCS102 ~]# 

 

 去掉 local:

hive> load data inpath '/data' into table student;
Loading data to table default.student
OK
Time taken: 0.161 seconds
hive> select * from student;
OK
1    {"name":"xiaoming","age":12}
2    {"name":"xiaohong","age":11}
1    {"name":"xiaoming","age":12}
2    {"name":"xiaohong","age":11}
Time taken: 0.118 seconds, Fetched: 4 row(s)
hive> 

導入之后,hdfs根目錄下data文件被移動(注意不是復制)到student下面:

 

Hive 內部表:CREATE TABLE [IF NOT EXISTS] table_name,刪除表時,元數據與數據都會被刪除
Hive 外部表:CREATE EXTERNAL TABLE [IF NOT EXISTS] table_name LOCATION hdfs_path,刪除外部表只刪除metastore的元數據,不刪除hdfs中的表數據

舉例:

CREATE EXTERNAL TABLE person3(
id INT,
name STRING,
age INT,
likes ARRAY<STRING>,
address MAP<STRING,STRING>
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
COLLECTION ITEMS TERMINATED BY '-'
MAP KEYS TERMINATED BY ':' 
LINES TERMINATED BY '\n'
LOCATION '/usr/';

Hive 建表
Create Table Like:
CREATE TABLE empty_key_value_store LIKE key_value_store;

Create Table As Select (CTAS):
CREATE TABLE new_key_value_store
AS
SELECT columA, columB FROM key_value_store;

4、分區 提高查詢效率,根據需求確定分區

(1)創建分區(分區字段不能再表的列中)
舉例:

CREATE TABLE psn2(
id INT,
name STRING,
likes ARRAY<STRING>,
address MAP<STRING,STRING>
)
PARTITIONED BY (age int)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
COLLECTION ITEMS TERMINATED BY '-'
MAP KEYS TERMINATED BY ':' 
LINES TERMINATED BY '\n';

否則報錯:
FAILED: SemanticException [Error 10035]: Column repeated in partitioning columns
hive> CREATE TABLE psn2(
    > id INT,
    > name STRING,
    > likes ARRAY<STRING>,
    > address MAP<STRING,STRING>
    > )
    > PARTITIONED BY (age int)
    > ROW FORMAT DELIMITED 
    > FIELDS TERMINATED BY ',' 
    > COLLECTION ITEMS TERMINATED BY '-'
    > MAP KEYS TERMINATED BY ':' 
    > LINES TERMINATED BY '\n';
OK
Time taken: 0.167 seconds
hive> desc psn2;
OK
id                      int                                         
name                    string                                      
likes                   array<string>                               
address                 map<string,string>                          
age                     int                                         
          
# Partition Information          
# col_name                data_type               comment             
age                     int                                         
Time taken: 0.221 seconds, Fetched: 9 row(s)
hive> 
導入數據:
hive> LOAD DATA LOCAL INPATH '/root/data1' INTO TABLE psn2 partition (age=10);
Loading data to table default.psn2 partition (age=10)
OK
Time taken: 0.678 seconds
hive> select * from psn2;
OK
1    小明1    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
2    小明2    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
3    小明3    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
4    小明4    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
5    小明5    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
6    小明6    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    10
Time taken: 1.663 seconds, Fetched: 6 row(s)
hive> 

hive> LOAD DATA LOCAL INPATH '/root/data1' INTO TABLE psn2 partition (age=20);
Loading data to table default.psn2 partition (age=20)
OK
Time taken: 0.36 seconds
hive> 

 

(2)修改分區

創建分區:

hive> CREATE TABLE psn3(
    > id INT,
    > name STRING,
    > likes ARRAY<STRING>,
    > address MAP<STRING,STRING>
    > )
    > PARTITIONED BY (age int,sex string)
    > ROW FORMAT DELIMITED 
    > FIELDS TERMINATED BY ',' 
    > COLLECTION ITEMS TERMINATED BY '-'
    > MAP KEYS TERMINATED BY ':' 
    > LINES TERMINATED BY '\n';
OK
Time taken: 0.061 seconds

導入數據:

hive> LOAD DATA LOCAL INPATH '/root/data1' INTO TABLE psn3 partition (age=10,sex='boy');
Loading data to table default.psn3 partition (age=10, sex=boy)
OK
Time taken: 0.351 seconds
hive> LOAD DATA LOCAL INPATH '/root/data1' INTO TABLE psn3 partition (age=20,sex='boy');
Loading data to table default.psn3 partition (age=20, sex=boy)
OK
Time taken: 0.339 seconds

增加分區:

hive> alter table psn3 add partition (age=10,sex='man');
OK
Time taken: 0.1 seconds
hive> alter table psn3 add partition (age=20,sex='man');
OK
Time taken: 0.067 seconds

刪除分區:

hive> alter table psn3 drop partition (sex='boy');
Dropped the partition age=10/sex=boy
Dropped the partition age=20/sex=boy
OK
Time taken: 0.472 seconds
hive> 

二、DML

導入數據

1、load 其實就是hdfs dfs -put 上傳文件
2、insert 插入數據,作用:(1)復制表;(2)中間表;(3)向不同表插入不同數據

CREATE TABLE psn4(
id INT,
name STRING,
likes ARRAY<STRING>
)
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ',' 
COLLECTION ITEMS TERMINATED BY '-'
LINES TERMINATED BY '\n';

from psn3
insert overwrite table psn4
select id,name,likes;

或者

from psn3
insert overwrite table psn4
select id,name,likes
insert overwrite table psn5
select id,name;

 

三、Hive SerDe - Serializer and Deserializer
SerDe 用於做序列化和反序列化。
構建在數據存儲和執行引擎之間,對兩者實現解耦。
Hive通過ROW FORMAT DELIMITED以及SERDE進行內容的讀寫。
row_format
: DELIMITED
[FIELDS TERMINATED BY char [ESCAPED BY char]]
[COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char]
[LINES TERMINATED BY char]
: SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]

建表:

hive> CREATE TABLE logtbl (
    >     host STRING,
    >     identity STRING,
    >     t_user STRING,
    >     a_time STRING,
    >     request STRING,
    >     referer STRING,
    >     agent STRING)
    >   ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    >   WITH SERDEPROPERTIES (
    >     "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \\[(.*)\\] \"(.*)\" (-|[0-9]*) (-|[0-9]*)"
    >   )
    >   STORED AS TEXTFILE;
OK
Time taken: 0.059 seconds

數據:

192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:35 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
192.168.57.4 - - [29/Feb/2016:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -

導入數據:

hive> load data local inpath '/root/log' into table logtbl;
Loading data to table default.logtbl
OK
Time taken: 0.137 seconds

查詢數據:

hive> select * from logtbl;
OK
192.168.57.4    -    -    29/Feb/2016:18:14:35 +0800    GET /bg-upper.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:35 +0800    GET /bg-nav.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:35 +0800    GET /asf-logo.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:35 +0800    GET /bg-button.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:35 +0800    GET /bg-middle.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET / HTTP/1.1    200    11217
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET / HTTP/1.1    200    11217
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /tomcat.css HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /tomcat.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /asf-logo.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /bg-middle.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /bg-button.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /bg-nav.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /bg-upper.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET / HTTP/1.1    200    11217
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /tomcat.css HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /tomcat.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET / HTTP/1.1    200    11217
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /tomcat.css HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /tomcat.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /bg-button.png HTTP/1.1    304    -
192.168.57.4    -    -    29/Feb/2016:18:14:36 +0800    GET /bg-upper.png HTTP/1.1    304    -
Time taken: 0.102 seconds, Fetched: 22 row(s)
hive>

四、Beeline 和hive作用相同另外一種方式,主要作用輸出類似二維表格(mysql控制台風格)
/usr/local/apache-hive-3.1.1-bin/bin/beeline 要與/usr/local/apache-hive-3.1.1-bin/bin/HiveServer2配合使用

首先,服務端啟動hiveserver2
然后,客戶端通過beeline兩種方式連接到hive
1、beeline -u jdbc:hive2://localhost:10000/default -n root
2、beeline
beeline> !connect jdbc:hive2://<host>:<port>/<db>;auth=noSasl root 123
默認 用戶名、密碼不驗證,命令行使用命令前面加!
退出使用:!quit

五、Hive JDBC

Hive JDBC運行方式
服務端啟動hiveserver2后,在java代碼中通過調用hive的jdbc訪問默認端口10000進行連接、訪問

package test.hive;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class HiveJdbcClient {

    private static String driverName = "org.apache.hive.jdbc.HiveDriver";

    public static void main(String[] args) throws SQLException {
        try {
            Class.forName(driverName);
        } catch (ClassNotFoundException e) {
            e.printStackTrace();
        }

        Connection conn = DriverManager.getConnection("jdbc:hive2://134.32.123.102:10000/default", "root", "");
        Statement stmt = conn.createStatement();
        String sql = "select * from psn2 limit 5";
        ResultSet res = stmt.executeQuery(sql);
        while (res.next()) {
            System.out.println(res.getString(1) + "-" + res.getString("name"));
        }
    }

}

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM