Hive建表語句、內部表vs外部表、加載數據

本文轉載自查看原文 2022-02-18 23:37 1332 Hive

一、Hive 查看SQL解析計划
二、Hive建表語句
三、Hive 內部表（Managed tables）vs 外部表（External tables）
四、Hive加載數據

一、Hive 查看SQL解析計划

#extended：展開。可選，可以打印更多細節
#explain：解釋
#在最前端加個explain，查看SQL解析計划
explain [extended] select  a.id
        ,a.name
        ,a.clazz
        ,t1.sum_score
from(
    select  id
            ,sum(score) as sum_score
    from score 
    group by id
)t1 right join (
    select  id
            ,name
            ,'文科一班' as clazz
    from students
    where clazz = '文科一班'
) a
on t1.id = a.id
order by t1.sum_score desc
limit 10;

二、Hive建表語句

#EXTERNAL：外部的
#定義表名
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name

  #定義字段名，字段類型，[添加字段注解]
  [(col_name data_type [COMMENT col_comment], ...)]
  
  #給表加上注解
  [COMMENT table_comment]
  
  #分區（字段名，字段類型是額外添加的）
  [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
  
  #分桶（字段名，字段類型是在前面已定義中選出來的）
  [CLUSTERED BY (col_name, col_name, ...) 
   
  #設置排序字段 升序、降序
  #num_buckets BUCKETS：桶的數量，通過hash取余獲得桶的數量=Reduce的任務數量
  [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
  
  [
  	#指定設置行、列分隔符
    #行分隔符一般不指定，默認換行符/n分隔；列分隔符需要我們手動指定
   [ROW FORMAT row_format]
      
   #指定Hive儲存格式：textFile、rcFile、SequenceFile 
   #如果不指定，默認為：textFile
   [STORED AS file_format]
  ]
  
  #指定儲存位置
  #如果是外部表，必須加上location；如果是內部表，可加可不加（一般不加）
  [LOCATION hdfs_path]
  #跟外部表配合使用，比如：映射HBase表，然后可以使用HQL對hbase數據進行查詢，當然速度比較慢
  [TBLPROPERTIES (property_name=property_value, ...)]  (Note:  only available starting with 0.6.0)
  [AS select_statement]  (Note: this feature is only available starting with 0.5.0.)

建表1：全部使用默認建表方式

create table students
(
    id bigint,
    name string,
    age int,
    gender string,
    clazz string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; #必選，指定列分隔符

建表2：指定location （這種方式也比較常用）

create table students2
(
    id bigint,
    name string,
    age int,
    gender string,
    clazz string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/input1'; 
#指定Hive表的數據的存儲位置，一般在數據已經上傳到HDFS，想要直接使用，會指定Location，通常Locaion會跟外部表一起使用，內部表有自己默認的location

建表3：指定存儲格式

create table students3
(
    id bigint,
    name string,
    age int,
    gender string,
    clazz string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS rcfile; #指定儲存格式為rcfile，inputFormat:RCFileInputFormat,outputFormat:RCFileOutputFormat，如果不指定，默認為textfile，注意：除textfile以外，其他的存儲格式的數據都不能直接加載，需要使用從表加載的方式。

建表4：create table xxxx as select_statement(SQL語句) (這種方式比較常用)

#將select * from students2的輸出結果作為數據，構建表students4，用as連接
#構建出來的表有數據，並且和select * from students2輸出結果保持一致
create table students4 as select * from students2;

建表5：create table xxxx like table_name 只想建表，不需要加載數據

#構建與表students結構一樣的表，構建出來的表沒有數據，用like連接
create table students5 like students;

三、Hive 內部表（Managed tables）vs 外部表（External tables）

建表：

#創建內部表
create table students_internal
(
    id bigint,
    name string,
    age int,
    gender string,
    clazz string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','    #分隔符語句（數據以逗號分隔開）
LOCATION '/input2';	#手動指定該內部表表格創建在HDFS下的/input2目錄內

#創建外部表，比內部表多了一個external
create external table students_external
(
    id bigint,
    name string,
    age int,
    gender string,
    clazz string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','	#分隔符語句（數據以逗號分隔開）
LOCATION '/input3';#手動指定該外部表表格創建在HDFS下的/input3目錄內

(1)通常創建的表都屬於內部表，不指定location，在哪個庫創建的表，
對應的HDFS中的目錄中，會顯示對應表名的目錄；
(2)如果在創建內部表的時候加上了location，例如LOCATION '/input2'，
那么此表是創建在了HDFS的input2目錄中，進入input3目錄並不會看到此表名；
進入HDFS的/input2，這個頁面就相當於是這個表
(3)在創建外部表的時候，必須要指定location，例如LOCATION '/input3'，
同樣的，外部表創建在了HDFS的input3目錄中，進入input3目錄並不會看到此表名；
進入HDFS的/input2，這個頁面就相當於是這個外部表

加載數據(上傳數據到HDFS)：

#上傳到內部表（數據會默認導入/input2/目錄下的students_internal表）
hive> dfs -put /usr/local/soft/data/students.txt /input2/;

#上傳到外部表（數據會默認導入/input3/目錄下的students_external表）
hive> dfs -put /usr/local/soft/data/students.txt /input3/;
#創建外部表后，在HDFS中並不會看到表名，但實際是存在的

刪除表：

hive> drop table students_internal;
Moved: 'hdfs://master:9000/input2' to trash at: hdfs://master:9000/user/root/.Trash/Current
OK
Time taken: 0.474 seconds
#在刪除內部表的時候，會提示放入了回收站里

hive> drop table students_external;
OK
Time taken: 0.09 seconds
hive> 
#在刪除外部表的時候，並會提示放入了回收站

刪除內部表的時候，表中的數據（HDFS上的文件）會被同表的元數據一起刪除；
（/input2、表名、上傳的students.txt數據文件都被刪除掉了）

刪除外部表的時候，只會刪除表的元數據，不會刪除表中的數據（HDFS上的文件）
（/input3、上傳的students.txt數據文件還在，表名沒了）

一般在公司中，使用外部表多一點，因為數據可以需要被多個程序使用，避免誤刪，通常外部表會結合location一起使用

外部表還可以將其他數據源中的數據映射到 hive中，比如說：hbase，ElasticSearch......

設計外部表的初衷就是讓表的元數據與數據解耦(分隔開)

四、Hive加載數據

1、使用`hdfs dfs -put 本地數據的路徑 hive表對應的HDFS目錄下;`

2、使用 `load data inpath 'hive表對應的HDFS目錄下表的數據' into table 表名;`

直接指定一張表，不需要指定表的路徑，在加載數據的時候讓它自己尋找目錄存放進去

下列命令需要在hive shell里執行

#將HDFS上的students表下面的數據加載到students2表中
#此處的加載數據，屬於移動數據，相當於剪切數據，加載之后，students表內的數據為空了
load data inpath '/user/hive/warehouse/test1.db/students/student.txt' into table students2;

#加上local關鍵字可以將Linux本地目錄下的文件加載到hive表對應HDFS目錄下，原文件不會被刪除
load data local inpath '/usr/local/soft/data/student.txt' into table students;

#overwrite 覆蓋加載
load data local inpath '/usr/local/soft/data/student.txt' overwrite into table students;

3、create table 表名 as SQL語句，也相當於一種加載方式

#將select * from students2的輸出結果作為數據加載到表students4中，用as連接（相當於復制）
#students4為新創建的表
create table students4 as select * from students2;

4、insert into table 表名 SQL語句（沒有as）

#將select * from students的輸出結果作為數據加載到表students2中（相當於復制）
insert into table students2 select * from students;

#覆蓋插入 把into 換成 overwrite
#將select * from students的輸出結果作為數據加載到表students2中（相當於覆蓋）
insert overwrite table students2 select * from students;

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 HADOOP-HIVE表-標准建表語句（外部表和內部表）- 學習筆記（5）批量導出hive表的建表語句 MySQL如何導出建表語句及如何建表利用MySQL原數據信息批量轉換指定庫數據表生成Hive建表語句 hive建表語句 hive建表語句 Hive內部表和外部表 HIVE 內部表和外部表 Hive內部表和外部表導出hive的建表語句，支持單個或多個指定表/庫下的所有表

Hive建表語句、內部表vs外部表、加載數據

一、Hive 查看SQL解析計划

二、Hive建表語句

建表1：全部使用默認建表方式

建表2：指定location （這種方式也比較常用）

建表3：指定存儲格式

建表4：create table xxxx as select_statement(SQL語句) (這種方式比較常用)

建表5：create table xxxx like table_name 只想建表，不需要加載數據

三、Hive 內部表（Managed tables）vs 外部表（External tables）

建表：

加載數據(上傳數據到HDFS)：

刪除表：

四、Hive加載數據

1、使用hdfs dfs -put 本地數據的路徑 hive表對應的HDFS目錄下;

2、使用 load data inpath 'hive表對應的HDFS目錄下表的數據' into table 表名;

3、create table 表名 as SQL語句，也相當於一種加載方式

4、insert into table 表名 SQL語句 （沒有as）

免責聲明！

1、使用`hdfs dfs -put 本地數據的路徑 hive表對應的HDFS目錄下;`

2、使用 `load data inpath 'hive表對應的HDFS目錄下表的數據' into table 表名;`

4、insert into table 表名 SQL語句（沒有as）