hive中表的創建和對表數據的操作

本文轉載自查看原文 2020-03-31 18:02 1481

一、hive中表分為兩種

　　1、內部表(管理表):

　　　　　　刪除表的時候刪除hdfs上的數據。

　　2、外部表

　　　　　　刪除表的時候不刪除hdfs上的數據。

　　　　　　外部表不能使用insert的方式插入數據，所有的數據來源，都是外部別人提供的，所以hive認為自己沒有獨占這份數據，所以刪除hive表的時候，不會刪　　　　　除表里面的數據

二、對hive表或者表中數據的操作;

　　1、insert into 一般強烈不建議使用這種方式來插入數據，因為會在HDFS上面產生小文件，影響HDFS的元數據管理

　　2、hive在建表的時候如果不使用分隔符,就默認使用\001.是一個asc碼值,一個非打印字符。

　　3、在創建表的時候指定分隔符

　　　　創建內部表

　　　　create table if not exists stu2(id int,name string) row format delimited fileds terminated by '\t' stored as textfile location '/user/hive/warehouse/myhive/stu2';

　　　　創建外部表

　　　　create external table if not exists student(s_id string,s_name string) row format delimited fields terminated by '\t' stored as textfile location '/user/hive/warehouse/myhive/student';

　　4、根據查詢結果創建表，並且將查詢結果的數據放到新建的表里面去

　　　　　　create table stu3 as select * from stu2;這種方式用的比較多

　　　　　　根據已經存在的表結構創建表，這種方式只復制表結構:

　　　　　　create table stu4 like stu2;

　　5、查詢表的類型:

　　　　desc formatted stu2;

　　6、如何向外部表里面加載數據呢?

　　　1、從本地文件系統向表中加載數據

　　　　　load data local inpath '/export/servers/hivedatas/student.csv' into table student;

　　　　　加載數據並覆蓋已有數據

　　　　　load data local inpath '/export/servers/hivedatas/student.csv' overwrite into table student;

　　　2、從hdfs文件系統向表中加載數據（需要提前將數據上傳到hdfs文件系統，其實就是一個移動文件的操作）

　　　　load data inpath '/hivedatas/techer.csv' into table techer;

　　7、分區表

　　　　一般會和內部表和外部表搭配使用。比如:內部分區表　　　　外部分區表

　　　　創建分區表的語法:

　　　　create table score(s_id string,c_id string, s_score int) partitioned by (month string) row format delimited fields terminated by '\t' stored as textfile;

　　　　create table score2(s_id string,c_id string, s_score int) partitioned by (year string,month string,day string) row format delimited fields terminated by '\t' stored 　　　　as textfile;

　　8、往分區表里面加載數據:

　　　加載數據到一個分區表

　　　load data local inpath '/export/servers/hivedatas/score.csv' into table score partition(month='201806');

　　　加載數據到多個分區表:

　　　load data local inpath '/export/servers/hivedatas/score.csv' into table score2 partition(year='2018',month='06',day='18')

　　9、查看分區

　　show partitions score2;

　　10、添加分區、刪除分區

　　添加一個分區

　　alter table score add partition(month='201805');

　　同時添加多個分區

　　alter table score2 add partition(year='2018',month='09',day='10');

　　刪除分區

　　alter table score drop partition(month='201809');

　　10、進行表的修復，需要手動修復

　　　　進行表的修復,說白了就是建立我們表與我們數據文件之間的一個關系映射

　　　　msck repair table score4;

　　11、分桶表

　　　　將數據按照指定的字段進行分成多個桶中去，說白了就是將數據按照字段進行划分，可以將數據按照字段划分到多個文件當中去。

　　　　開啟hive的桶表功能，默認是false關閉得

　　　　set hive.enforce.bucketing=true;

　　　　設置reduce的個數，默認是-1

　　　　set mapreduce.job.reduces=3;

　　　　怎么創建桶表?

　　　　create table course(c_id string,c_name string,t_id string) clustered by (c_id) into 3 buckets row format delimited fields terminated by '\t' stored as textfile;

　　　　桶表的數據加載，由於桶表的數據加載通過hdfs dfs -put文件或者通過load data均不好使，只能通過insert overwrite例如:

　　　　insert overwrite table course select * from course_common cluster by(c_id);

　　12、hive當中表得修改

　　　　　　1、重命名　　alter table old_table_name rename to new_table_name;

　　　　　　2、增加/修改列信息

　　　　　　　　（1）查詢表結構

　　　　　　　　　　desc score5;

　　　　　　　　（2）添加列

　　　　　　　　　　alter table score5 add columns (mycol string, mysco string);

　　　　　　　　（3）查詢表結構

　　　　　　　　　　desc score5;

　　　　　　　　（4）更新列

　　　　　　　　　　alter table score5 change column mysco mysconew int;

　　　　　　　　（5）查詢表結構

　　　　　　　　　　desc score5;

　　13、hive表中的多插入模式常用於生產環境(距離)

　　　　常用於實際生產環境當中，將一張表拆開成兩部分或者多部分給score表加載數據

　　　　load data local inpath '/export/servers/hivedatas/score.csv' overwrite into table score partition(month='201806');

　　　　創建第一部分表：

　　　　create table score_first( s_id string,c_id string) partitioned by (month string) row format delimited fields terminated by '\t' ;

　　　　創建第二部分表：

　　　　create table score_second(c_id string,s_score int) partitioned by (month string) row format delimited fields terminated by '\t';

　　　　分別給第一部分與第二部分表加載數據

　　　　from score

　　　　 insert overwrite table score_first partition(month='201806') select s_id,c_id

　　　　 insert overwrite table score_second partition(month = '201806') select c_id,s_score;

　　14、外部表和外部分區表的一點小區別:

　　　創建外部表的時候可以通過指定的location將我們的數據放到指定的位置，外部表就可以加載數據了了解一下

　　　如果是外部分區表，數據需要放到對應的路徑，而且還需要執行修復的命令 msck repair table xxxtb_name

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 數據庫的創建和操作 Oracle和SQL server查詢數據庫中表的創建和最后修改時間 Oracle和SQL server查詢數據庫中表的創建和最后修改時間 Think PHP 3.2 創建Model對象對表的操作（查數據）數據的增刪改查 ThinkPHP3創建Model模型--對表的操作 2018-11-13#Hive外表創建和加載數據 linux用戶創建和操作 Hive對表建立索引 hive分區表中表字段操作如何查看數據庫中表的創建時間