Hive:常用的一些命令


1、一般可以通過beeline(CDH默認是使用hive,就會進入hive操作窗口),代理方式登錄hive;

2、使用數據庫abc_hive_db:use abc_hive_db;

3、查看數據庫中有哪些表:show tables ;有哪些特定表 show tables like '*tb_site*';

4、查看某張表的表結構:desc tablename;

方法2:查看表的字段信息及元數據存儲路徑
desc extended table_name;
方法3:查看表的字段信息及元數據存儲路徑
desc formatted table_name;
 
備注:查看表元數據存儲路徑時,推薦方法3,信息比較清晰。

 

二、查看表容量大小
方法1:查看一個hive表文件總大小時(單位為Byte),我們可以通過一行腳本快速實現,其命令如下:
--#查看普通表的容量
$ hadoop fs -ls  /user/hive/warehouse/table_name|awk -F ' ' '{print $5}'|awk '{a+=$1}END{print a}'
48
這樣可以省去自己相加,下面命令是列出該表的詳細文件列表
$ hadoop fs -ls  /user/hive/warehouse/table_name
--#查看分區表的容量 
$ hadoop fs -ls  /user/hive/warehouse/table_name/yyyymm=201601|awk -F ' ' '{print $5}'|awk '{a+=$1}END{print a/(1024*1024*1024)}'
39.709
這樣可以省去自己相加,下面命令是列出該表的詳細文件列表
$ hadoop fs -ls  /user/hive/warehouse/table_name/yyyymm=201601
方法2:查看該表總容量大小,單位為G
$ hadoop fs -du /user/hive/warehouse/table_name|awk '{ SUM += $1 } END { print SUM/(1024*1024*1024)}'

5、創建表:

--OID,MSISDN,StartTime,EndTime,AP_MAC,ApAliasName,HotSpotName,Longitude,Latitude,Floor
0: jdbc:hive2://xxxxx/> create table tmp_wifi1109(OID string,MSISDN string,StartTime timestamp,
EndTime timestamp,AP_MAC string,ApAliasName string,HotSpotName string,Longitude string,Latitude
string,Floor string) row format delimited fields terminated by ',' stored as textfile;

表添加字段、修改、刪除字段:
ALTER TABLE name RENAME TO new_name
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...]
18 rows selected (3.608 seconds)
0: jdbc:hive2://10.178.152.162:21066/> alter table res_site_hangzhou add columns (cell_id_x16 string);
No rows affected (1.985 seconds)
0: jdbc:hive2://10.178.152.162:21066/> desc res_site_hangzhou;
+----------------------------+------------+----------+--+
|          col_name          | data_type  | comment  |
+----------------------------+------------+----------+--+
| oid                        | int        |          |
| objectid                   | int        |          |
....
| cell_id_x16                | string     |          |
+----------------------------+------------+----------+--+

0: jdbc:hive2://10.178.152.162:21066/> alter table res_site_hangzhou change cell_id_x16 objectidx16 string;
No rows affected (2.085 seconds)
0: jdbc:hive2://10.178.152.162:21066/> desc res_site_hangzhou;
+----------------------------+------------+----------+--+
|          col_name          | data_type  | comment  |
+----------------------------+------------+----------+--+
| oid                        | int        |          |
| objectid                   | int        |          |
....
| objectidx16                | string     |          |
+----------------------------+------------+----------+--+

 

6、從hdfs文件中導入數據到表:

注意:tmp_wifi1109創建時格式要求設置:
create table if not exists tmp_wifi1109(id int,name string) row format delimited fields terminated by ',' stored as textfile;
入庫代碼:
0
: jdbc:hive2://xxxx/> load data inpath 'hdfs:/user/xx_xx/dt/wifi_user_list_1109.csv' into table tmp_wifi1109;
0: jdbc:hive2://xxxx/> load data [local] inpath '/wifi_user_list_1109.csv' [overwrite] into table tmp_wifi1109;

7、把表之間關聯的結果存儲某張新建表:

create table tmp_mr_s1_mme1109 as 
select distinct b.OID,b.MSISDN,b.StartTime,b.EndTime,b.AP_MAC,b.ApAliasName,b.HotSpotName,b.Longitude,b.Latitude,b.Floor,
a.ues1ap_id,a.cellid
from default.s1mme a join abc_hive_db.tmp_wifi1109 b on a.msisdn=b.MSISDN and a.hour>='20161109' and a.hour<'20161110' where (
  (a.start_time<=b.StartTime and a.end_time>=b.StartTime)
  or (a.start_time<=b.EndTime and a.end_time>=b.EndTime)
  or (a.start_time>=b.StartTime and a.end_time<=b.EndTime)
)

8、導出某張表中的記錄到hdfs:

 
         

  set hive.merge.mapfiles=true;
  set hive.merge.mapredfiles=true;

  set hive.merge.size.per.task= 1000000000;

set hive.merge.smallfiles.avgsize= 1000000000; use abc_hive_db; insert overwrite directory '/user/dt/dat/1109/' row format delimited fields terminated by '|' select * from tmp_mr_s1_mme1109;

-- 如果此時到出文件沒有合並的情況下:可以使用getmerge來合並文件。
hdfs dfs -getmerge /user/dt/dat/1100/* mergefile.csv

導出文件是指定分列格式:

insert overwrite directory '/user/jr/dt/my_table' 
row format delimited fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
select * from my_table

 

9、查看表分區字段:

0: jdbc:hive2://xxx/> show partitions default.s1_mme;
+------------------------------------+--+
|             partition              |
+------------------------------------+--+
| hour=2016110214                  |
| hour=2016110215                  |
| hour=2016110216                  |
...
+------------------------------------+--+
如果某一個張表中包含多個分區字段例如(default.s1_mme包含兩個分區(p_city,p_day)),那么當我們查看某個城市都有哪些天數據分區的時候就可以執行以下命令:
0: jdbc:hive2://xxx/> show partitions default.s1_mme partition(p_city='wuhan');
 
 加載數據到某個分區下:
load data local inpath '/sd/dataext' into table testPar partition(dt='20180117');

參考信息:

hive partitions相關處理:http://blog.sina.com.cn/s/blog_9f48885501016hn5.html

http://blog.sina.com.cn/s/blog_9f48885501016k5m.html


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM