Hive - Create Table&Drop Table & ALTER Table(上)


寫在前面:本來想着把表的創建,刪除,以及修改一篇搞定的。結果看了一下,東西還是蠻多的,而且也是很多經常使用的操作。所以,就暫且分開處理吧。

特別提醒:在日常不管是創建庫、表還是修改字段,刪除等操作,建議都加上 [IF NOT EXISTS] | [IF EXISTS] 選項;雖然是可選項,但是還是小心為上,萬一你在操作時沒有加庫名,又操作錯了,那你哭都找不到地方。

 

This chapter explains how to create a table and how to insert data into it. The conventions of creating a table in HIVE is quite similar to creating a table using SQL.

Create Table Statement

Create Table is a statement used to create a table in Hive. The syntax and example are as follows:

Syntax

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name

[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]

譯注:hive中stored的 file_format目前有:(參考http://blog.csdn.net/yfkiss/article/details/7787742)

  • TEXTFILE:默認格式,數據不做壓縮,磁盤開銷大,數據解析開銷大。可結合Gzip、Bzip2使用(系統自動檢查,執行查詢時自動解壓),但使用這種方式,hive不會對數據進行切分,從而無法對數據進行並行操作。
  • SEQUENCEFILE:SequenceFile是Hadoop API提供的一種二進制文件支持,其具有使用方便、可分割、可壓縮的特點。SequenceFile支持三種壓縮選擇:NONE, RECORD, BLOCK。 Record壓縮率低,一般建議使用BLOCK壓縮。
  • RCFILE:RCFILE是一種行列存儲相結合的存儲方式。首先,其將數據按行分塊,保證同一個record在一個塊上,避免讀一個記錄需要讀取多個block。其次,塊數據列式存儲,有利於數據壓縮和快速的列存取。RCFILE文件示例:
  • 自定義格式:當用戶的數據文件格式不能被當前 Hive 所識別的時候,可以自定義文件格式。
    用戶可以通過實現inputformat和outputformat來自定義輸入輸出格式,參考代碼:
    .\hive-0.8.1\src\contrib\src\java\org\apache\hadoop\hive\contrib\fileformat\base64

Example

Let us assume you need to create a table named employee using CREATE TABLE statement. The following table lists the fields and their data types in employee table:

Sr.No Field Name Data Type
1 Eid int
2 Name String
3 Salary Float
4 Designation string

The following data is a Comment, Row formatted fields such as Field terminator, Lines terminator, and Stored File type.

COMMENT ‘Employee details’
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED IN TEXT FILE

 

The following query creates a table named employee using the above data.

hive> CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;

 

譯注:目前我使用的腳本樣例如下:按照dt 字段進行分區,這個后續有一篇專門講到了分區,你可以先去看看英文版https://www.tutorialspoint.com/hive/hive_partitioning.htm

CREATE TABLE IF NOT EXISTS  `snapshot_task_sub` (
  `task_sub_id` INT COMMENT '任務擴展子表ID',
  `task_id` INT COMMENT '任務ID',
  `car_series` INT COMMENT '車系ID',
  `series_name` STRING COMMENT '車系名稱',
  `purchase_amount` INT COMMENT '購買數量',
  `price` DOUBLE COMMENT '最新投放單價',
  `published_price` DOUBLE COMMENT '刊例價',
  `state` TINYINT COMMENT '狀態 0正常 2刪除',
  `create_time` STRING COMMENT '創建時間',
  `edit_time` STRING  COMMENT '修改時間',
  `snap_time` STRING COMMENT '快照時間'
) 
COMMENT '任務子表天快照表' 
PARTITIONED BY (`dt` STRING) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE ;

 

If you add the option IF NOT EXISTS, Hive ignores the statement in case the table already exists.

On successful creation of table, you get to see the following response:

OK
Time taken: 5.905 seconds
hive>

JDBC Program

The JDBC program to create a table is given example.

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveCreateTable {
   private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
   
   public static void main(String[] args) throws SQLException {
   
      // Register driver and create driver instance
      Class.forName(driverName);
      
      // get connection
      Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/userdb", "", "");
      
      // create statement
      Statement stmt = con.createStatement();
      
      // execute statement
      stmt.executeQuery("CREATE TABLE IF NOT EXISTS "
         +" employee ( eid int, name String, "
         +" salary String, destignation String)"
         +" COMMENT ‘Employee details’"
         +" ROW FORMAT DELIMITED"
         +" FIELDS TERMINATED BY ‘\t’"
         +" LINES TERMINATED BY ‘\n’"
         +" STORED AS TEXTFILE;");
         
      System.out.println(“ Table employee created.”);
      con.close();
   }
}

 

Save the program in a file named HiveCreateDb.java. The following commands are used to compile and execute this program.

$ javac HiveCreateDb.java
$ java HiveCreateDb

Output

Table employee created.

Load Data Statement

Generally, after creating a table in SQL, we can insert data using the Insert statement. But in Hive, we can insert data using the LOAD DATA statement.

While inserting data into Hive, it is better to use LOAD DATA to store bulk records. There are two ways to load data: one is from local file system and second is from Hadoop file system.

通常,在SQL中創建表之后,我們可以使用Insert語句插入數據。 但在Hive中,我們可以使用LOAD DATA語句插入數據。

在將數據插入Hive時,最好使用LOAD DATA來存儲批量記錄。 有兩種方式加載數據:一種來自本地文件系統,另一種來自Hadoop文件系統

Syntax

The syntax for load data is as follows:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename 
[PARTITION (partcol1=val1, partcol2=val2 ...)]

 

  • LOCAL is identifier to specify the local path. It is optional. LOCAL是用於指定本地路徑的標識符,可選參數
  • OVERWRITE is optional to overwrite the data in the table. -- 如果指定了OVERWRITE,那么會覆蓋表內所有數據,慎重
  • PARTITION is optional.可選參數

Example

We will insert the following data into the table. It is a text file named sample.txt in /home/user directory.

1201  Gopal       45000    Technical manager
1202  Manisha     45000    Proof reader
1203  Masthanvali 40000    Technical writer
1204  Kiran       40000    Hr Admin
1205  Kranthi     30000    Op Admin

 

The following query loads the given text into the table.

hive> LOAD DATA LOCAL INPATH '/home/user/sample.txt'
OVERWRITE INTO TABLE employee;

 

On successful download, you get to see the following response:

OK
Time taken: 15.905 seconds
hive>

JDBC Program

Given below is the JDBC program to load given data into the table.

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;

public class HiveLoadData {

   private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
   
   public static void main(String[] args) throws SQLException {
   
      // Register driver and create driver instance
      Class.forName(driverName);
      
      // get connection
      Connection con = DriverManager.getConnection("jdbc:hive://localhost:10000/userdb", "", "");
      
      // create statement
      Statement stmt = con.createStatement();
      
      // execute statement
      stmt.executeQuery("LOAD DATA LOCAL INPATH '/home/user/sample.txt'" + "OVERWRITE INTO TABLE employee;");
      System.out.println("Load Data into employee successful");
      
      con.close();
   }
}

 

Save the program in a file named HiveLoadData.java. Use the following commands to compile and execute this program.

$ javac HiveLoadData.java
$ java HiveLoadData

 

Output:

Load Data into employee successful


-------------
英文文章地址:https://www.tutorialspoint.com/hive/hive_create_table.htm


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM