Hive 數據類型 + Hive sql


Hive 數據類型 + Hive sql

基本類型

  • 整型
    • int tinyint (byte) smallint(short) bigint(long)
  • 浮點型
    • float double
  • 布爾
    • boolean
  • 字符
    • string char(定長) varchar(變長)
  • 時間類型
    • timestamp date

引用/復合類型

  • 優點類似於容器(Container),便於我們操作數據
  • 復合類型可以和復合類型相互嵌套
  • Array
    • 存放相同類型的數據
    • 數據按照索引進行查找,索引默認從0開始
    • user[0]
  • Map
    • 一組鍵值對,通過key可以訪問到value
    • key不能相同,相同的key會相互覆蓋
    • map['first']
  • Struct(就是C語言中的結構體, golang中也有)
    • 定義對象的屬性,結構體的屬性都是固定的
    • 通過屬性獲取值
    • user.uname

類型轉換

  • 自動
    • 任何整數類型都可以隱式地轉換為一個范圍更廣的類型
    • 所有整數類型、FLOAT和STRING類型都可以隱式地轉換成DOUBLE。
    • TINYINT、SMALLINT、INT都可以轉換為FLOAT。
    • BOOLEAN類型不可以轉換為任何其它的類型。
  • 強制
    • CAST('1' AS INT)
  • 在設計表的時候,盡量將數據類型設置為合適的類型
  • 防止以后操作中沒必要的麻煩

DDL操作--數據庫

庫,表,字段等命名要注意命名規范

執行數據庫組件的定義(創建,修改,刪除)功能

執行任何的hivesql語句在語句末尾都要加上分號(😉

數據庫

  • 創建數據庫

    • 每創建一張表都會在HDFS文件系統中創建一個目錄

      • create database ronnie;
      • create database if not exists ronnie;
    • 創建數據庫並制定存放的位置

      • create database ronnie location '/ronnie/ronnie_test;

        1569236690603

  • 刪除數據庫

    • drop database 庫名;
    • drop database if exists 庫名;
    • 如果當前庫不為空,級聯刪除
      • drop database if exists 庫名 cascade;
  • 修改數據庫信息

    • 數據庫的其他元數據信息都是不可更改的
      • 數據庫名
      • 數據庫所在的目錄位置。
    • alter database ronnie set dbproperties('createtime'='20170830');[設置庫屬性]
  • 顯示數據庫

    • show databases;

      hive> show databases;
      OK
      default
      ronnie
      Time taken: 0.228 seconds, Fetched: 2 row(s)
      hive> 
      
      
    • show databases like 'r*'; [模糊匹配]

    hive> show databases like'r*';
    OK
    ronnie
    Time taken: 0.01 seconds, Fetched: 1 row(s)
    hive> 
    
    
  • 查看信息

    • desc database ronnie;
  • 使用數據庫

    • use ronnie;

DDL操作-表

  • 表的創建方式:表示對數據的映射,所以表示根據數據來設計的

創建表

  • 創建表寫語句的時候,千萬不要出現tab鍵,會出現亂碼

  • 創建數據文件,上傳到Linux

  • 創建userinfo表,會在數據庫的文件夾中創建一個表名文件夾

  • 將數據載入到表中

    ronnieInfo.txt
1,luna,00000
2,slark,11111
3,sven,22222
4,anit_mage,33333
create table ronnieInfo(
id int,
uname string,
password string
)
row format delimited fields terminated by ',' lines terminated by '\n';

load data local inpath '/root/ronnieInfo.txt' overwrite into table ronnieInfo;

select * from ronnieInfo
select id from ronnieInfo where id = 2;

命令行顯示:

hive> select * from ronnieInfo;
OK
1	luna	00000
2	slark	11111
3	sven	22222
4	anit_mage	33333
Time taken: 0.322 seconds, Fetched: 4 row(s)
hive> select id from ronnieInfo where id = 2;
OK
2
Time taken: 0.151 seconds, Fetched: 1 row(s)
重要指令集:
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name 
(col_name data_type [COMMENT col_comment], ...)
[COMMENT table_comment] 
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] 
[CLUSTERED BY (col_name, col_name, ...) ]
[SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] 
[ROW FORMAT row_format] 
[STORED AS file_format] 
[LOCATION hdfs_path]
  • CREATE
    • 關鍵字,創建表
  • [EXTERNAL]
    • 表的類型,內部表還是外部表
  • TABLE
    • 創建的類型
  • [IF NOT EXISTS]
    • 判斷這個表是否存在
  • table_name
    • 表名,要遵循命名規則
  • (col_name data_type [COMMENT col_comment], ...)
    • 定義一個列 (列名1 數據類型1,列名2 數據類型1)
    • 列與列之間用逗號隔開,最后一個列不需要加,
  • [COMMENT table_comment]
    • 表的注釋信息
  • [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
    • 創建分區表
  • [CLUSTERED BY (col_name, col_name, ...)
    • 分桶
  • [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
    • 分桶
  • [ROW FORMAT row_format]
    • 每一行數據切分的格式
  • [STORED AS file_format]
    • 數據存放的格式
  • [LOCATION hdfs_path]
    • 數據文件的地址

修改表

修改表的時候文件夾也會修改名字

ALTER TABLE ronnieInfo RENAME TO ronnie_info;

更新列

ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment][FIRST|AFTER column_name];

增加替換列

ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...);

查看表結構

desc table_name;

刪除表

DROP TABLE [IF EXISTS] table_name;

例子:

1,alex,18,game-exercise-book,stu_addr:auckland-work_addr:wellington
2,john,26,shop-lib-learn,stu_addr:queensland-work_addr:sydney
3,paul,20,cook-eat,stu_addr:brisbane-work_addr:gold_coast


create table personInfo(
id int,
name string,
age int,
fav array<string>,
addr struct<stu_addr:string,work_addr:string>
)
row format delimited fields terminated by ',' 
collection items terminated by '-' 
map keys terminated by ':' 
lines terminated by '\n';

load data local inpath '/root/personInfo.txt' overwrite into table personInfo;
select * from personInfo;

顯示表:

hive> select * from personInfo;
OK
1	alex	18	["game","exercise","book"]	{"stu_addr":"stu_addr:auckland","work_addr":"work_addr:wellington"}
2	john	26	["shop","lib","learn"]	{"stu_addr":"stu_addr:queensland","work_addr":"work_addr:sydney"}
3	paul	20	["cook","eat"]	{"stu_addr":"stu_addr:brisbane","work_addr":"work_addr:gold_coast"}
Time taken: 0.058 seconds, Fetched: 3 row(s)

載入數據-load

  • 數據一旦被導入就不可以被修改

    • 數據會被存放到HDFS上,HDFS不支持數據的修改
  • 語法結構

    • load data [local] inpath '/opt/module/datas/student.txt' overwrite | into table student [partition (partcol1=val1,…)];
      
      load data 固定語法
      [local] :如果有local說明分析本地數據,如果去掉local說明分析hdfs上的數據
      inpath '/opt/module/datas/student.txt' 導入數據的路徑
      overwrite 新導入的數據覆蓋以前的數據
      into table student 導入到那張表中
      
    • Linux

      • load data local inpath '/root/personInfo.txt' into table personInfo;
  • load data local inpath '/root/ronnieInfo.txt' overwrite into table ronnie_info;

  • HDFS

    • load data inpath '/ronnie/hive/personInfo.txt' into table personInfo;
      • load data inpath '/ronnie/hive/ronnieInfo.txt' overwrite into table ronnie_info;
    • 總結:
      • 不管數據文件在哪,只要是內部表,數據文件都會拷貝一份到數據庫表的文件夾中
      • 如果是追加拷貝,查詢數據的時候會查詢所有的數據文件
      • 當我刪除數據文件的時候

載入數據-insert

  • 查詢t1表的數據插入到t2表中

    • 1,admin
      2,zs
      3,ls
      4,ww
      
      create table t1(
      id string,
      name string
      )
      row format delimited fields terminated by ','  
      lines terminated by '\n';
      
      load data local inpath '/root/t1.txt' into table t1;
      
      create table t2(
      name string
      );
      //會開啟Mapreduce任務
      insert overwrite table t2 select name from t1;
      
      
      

    執行mapreduce結果:

    hive> insert overwrite table t2 select name from t1;
    Query ID = root_20190924045312_e3340ec4-55ad-4250-80c0-bf5f958eb4ab
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1569214475993_0001, Tracking URL = http://node03:8088/proxy/application_1569214475993_0001/
    Kill Command = /opt/ronnie/hadoop-2.6.5/bin/hadoop job  -kill job_1569214475993_0001
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    2019-09-24 04:53:20,136 Stage-1 map = 0%,  reduce = 0%
    2019-09-24 04:53:27,335 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.96 sec
    MapReduce Total cumulative CPU time: 960 msec
    Ended Job = job_1569214475993_0001
    Stage-4 is selected by condition resolver.
    Stage-3 is filtered out by condition resolver.
    Stage-5 is filtered out by condition resolver.
    Moving data to: hdfs://ronnie/ronnie_hive/ronnie_test/t2/.hive-staging_hive_2019-09-24_04-53-12_193_1698682512625223581-1/-ext-10000
    Loading data to table ronnie.t2
    Table ronnie.t2 stats: [numFiles=1, numRows=4, totalSize=15, rawDataSize=11]
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1   Cumulative CPU: 0.96 sec   HDFS Read: 3008 HDFS Write: 80 SUCCESS
    Total MapReduce CPU Time Spent: 960 msec
    OK
    Time taken: 16.388 seconds
    
    
  • 將一次查詢的結果放入到多張表中

    • //在上面數據的基礎上
      create table t3(
      id string
      );
      
      //會開啟Mapreduce任務
      from t1
      INSERT OVERWRITE TABLE t2  SELECT name 
      INSERT OVERWRITE TABLE t3  SELECT id ;
      

    MapReduce執行結果:

    Query ID = root_20190924045620_5582ef76-bbdc-4b60-b9e1-ba9e63b65865
    Total jobs = 5
    Launching Job 1 out of 5
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1569214475993_0002, Tracking URL = http://node03:8088/proxy/application_1569214475993_0002/
    Kill Command = /opt/ronnie/hadoop-2.6.5/bin/hadoop job  -kill job_1569214475993_0002
    Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0
    2019-09-24 04:56:27,406 Stage-2 map = 0%,  reduce = 0%
    2019-09-24 04:56:33,559 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.07 sec
    MapReduce Total cumulative CPU time: 1 seconds 70 msec
    Ended Job = job_1569214475993_0002
    Stage-5 is selected by condition resolver.
    Stage-4 is filtered out by condition resolver.
    Stage-6 is filtered out by condition resolver.
    Stage-11 is selected by condition resolver.
    Stage-10 is filtered out by condition resolver.
    Stage-12 is filtered out by condition resolver.
    Moving data to: hdfs://ronnie/ronnie_hive/ronnie_test/t2/.hive-staging_hive_2019-09-24_04-56-20_574_2344930125947110148-1/-ext-10000
    Moving data to: hdfs://ronnie/ronnie_hive/ronnie_test/t3/.hive-staging_hive_2019-09-24_04-56-20_574_2344930125947110148-1/-ext-10002
    Loading data to table ronnie.t2
    Loading data to table ronnie.t3
    Table ronnie.t2 stats: [numFiles=1, numRows=0, totalSize=15, rawDataSize=0]
    Table ronnie.t3 stats: [numFiles=1, numRows=0, totalSize=8, rawDataSize=0]
    MapReduce Jobs Launched: 
    Stage-Stage-2: Map: 1   Cumulative CPU: 1.07 sec   HDFS Read: 3981 HDFS Write: 153 SUCCESS
    Total MapReduce CPU Time Spent: 1 seconds 70 msec
    OK
    Time taken: 14.425 seconds
    
    
  • 按照原始SQL數據插入的方式

    • insert into t1 values ('id','5'),('name','yyz');
      
  • 內部表與外部表

    • 內部表

      • 一般處理自己獨享的數據,防止別人的誤刪除
      • 刪除表的時候,會一起將數據文件刪除
      • 內部表不適合和其他工具共享數據。
    • 外部表

      • 可以和別的表共享數據
      • 刪除表的時候,不會將數據文件刪除
      create EXTERNAL table ronnie_ex(
      id int,
      name string,
      age int,
      fav array<string>,
      addr struct<stu_addr:string,work_addr:string>
      )
      row format delimited fields terminated by ',' 
      collection items terminated by '-' 
      map keys terminated by ':' 
      lines terminated by '\n';
      
      //加載本地文件到外部表,文件會保存到表文件夾
      load data local inpath '/root/ronnie_ex.txt' into table ronnie_ex;
      //加載HDFS到外部表,依然會並拷貝一份到表文件夾
      load data inpath '/ex/ronnie_ex.txt' into table ronnie_ex;
      
      • 為了數據的共享,可以將外部表地址直接設置到數據地址

        • create EXTERNAL table ronnie_ex_location(
          id int,
          name string,
          age int,
          fav array<string>,
          addr struct<stu_addr:string,work_addr:string>
          )
          row format delimited fields terminated by ',' 
          collection items terminated by '-' 
          map keys terminated by ':' 
          lines terminated by '\n'
          location '/ronnie/ex';
          
        • 外部表與內部表的切換(內-->外)

          • alter table personInfo set tblproperties('EXTERNAL'='TRUE');
          • alter table personInfo set tblproperties('EXTERNAL'='FALSE');

      表的地址

      • 修改表數據的存放地址
      • 創建表的時候,會預先清空改文件夾中所有的數據
      create table ronnieUserPath111(
      id int,
      name string,
      age int,
      fav array<string>,
      addr struct<stu_addr:string,work_addr:string>
      )
      row format delimited fields terminated by ',' 
      collection items terminated by '-' 
      map keys terminated by ':' 
      lines terminated by '\n'
      location '/ronnie/ex';
      

      數據導出

      • 將查詢的結果導出到本地

        • insert overwrite local directory '/root/t11' select * from t1;
      • 將查詢的結果格式化導出到本地

        • insert overwrite local directory '/root/t12'

          ​ ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' select * from t1;

      • 將查詢的結果導出到HDFS上

        • insert overwrite local directory '/ronnie/t13' select * from t1;
      • 使用export/import導出數據

        • export table t1 to '/ronnie/hive/t1';
        • import from '/ronnie/hive/t1';


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM