1、一般可以通過beeline(CDH默認是使用hive,就會進入hive操作窗口),代理方式登錄hive;
2、使用數據庫abc_hive_db:use abc_hive_db;
3、查看數據庫中有哪些表:show tables ;有哪些特定表 show tables like '*tb_site*';
4、查看某張表的表結構:desc tablename;
方法2:查看表的字段信息及元數據存儲路徑desc extended table_name;
方法3:查看表的字段信息及元數據存儲路徑desc formatted table_name;
備注:查看表元數據存儲路徑時,推薦方法3,信息比較清晰。
二、查看表容量大小方法1:查看一個hive表文件總大小時(單位為Byte),我們可以通過一行腳本快速實現,其命令如下:--#查看普通表的容量$ hadoop fs -ls /user/hive/warehouse/table_name|awk -F ' ' '{print $5}'|awk '{a+=$1}END{print a}' 48這樣可以省去自己相加,下面命令是列出該表的詳細文件列表$ hadoop fs -ls /user/hive/warehouse/table_name--#查看分區表的容量$ hadoop fs -ls /user/hive/warehouse/table_name/yyyymm=201601|awk -F ' ' '{print $5}'|awk '{a+=$1}END{print a/(1024*1024*1024)}' 39.709這樣可以省去自己相加,下面命令是列出該表的詳細文件列表$ hadoop fs -ls /user/hive/warehouse/table_name/yyyymm=201601
方法2:查看該表總容量大小,單位為G$ hadoop fs -du /user/hive/warehouse/table_name|awk '{ SUM += $1 } END { print SUM/(1024*1024*1024)}'
5、創建表:
--OID,MSISDN,StartTime,EndTime,AP_MAC,ApAliasName,HotSpotName,Longitude,Latitude,Floor 0: jdbc:hive2://xxxxx/> create table tmp_wifi1109(OID string,MSISDN string,StartTime timestamp,
EndTime timestamp,AP_MAC string,ApAliasName string,HotSpotName string,Longitude string,Latitude
string,Floor string) row format delimited fields terminated by ',' stored as textfile;
表添加字段、修改、刪除字段:ALTER TABLE name RENAME TO new_name ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...]) ALTER TABLE name DROP [COLUMN] column_name ALTER TABLE name CHANGE column_name new_name new_type ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...]
18 rows selected (3.608 seconds) 0: jdbc:hive2://10.178.152.162:21066/> alter table res_site_hangzhou add columns (cell_id_x16 string); No rows affected (1.985 seconds) 0: jdbc:hive2://10.178.152.162:21066/> desc res_site_hangzhou; +----------------------------+------------+----------+--+ | col_name | data_type | comment | +----------------------------+------------+----------+--+ | oid | int | | | objectid | int | | .... | cell_id_x16 | string | | +----------------------------+------------+----------+--+ 0: jdbc:hive2://10.178.152.162:21066/> alter table res_site_hangzhou change cell_id_x16 objectidx16 string; No rows affected (2.085 seconds) 0: jdbc:hive2://10.178.152.162:21066/> desc res_site_hangzhou; +----------------------------+------------+----------+--+ | col_name | data_type | comment | +----------------------------+------------+----------+--+ | oid | int | | | objectid | int | | .... | objectidx16 | string | | +----------------------------+------------+----------+--+
6、從hdfs文件中導入數據到表:
注意:tmp_wifi1109創建時格式要求設置:
create table if not exists tmp_wifi1109(id int,name string) row format delimited fields terminated by ',' stored as textfile;
入庫代碼:
0: jdbc:hive2://xxxx/> load data inpath 'hdfs:/user/xx_xx/dt/wifi_user_list_1109.csv' into table tmp_wifi1109;
0: jdbc:hive2://xxxx/> load data [local] inpath '/wifi_user_list_1109.csv' [overwrite] into table tmp_wifi1109;
7、把表之間關聯的結果存儲某張新建表:
create table tmp_mr_s1_mme1109 as select distinct b.OID,b.MSISDN,b.StartTime,b.EndTime,b.AP_MAC,b.ApAliasName,b.HotSpotName,b.Longitude,b.Latitude,b.Floor,
a.ues1ap_id,a.cellid from default.s1mme a join abc_hive_db.tmp_wifi1109 b on a.msisdn=b.MSISDN and a.hour>='20161109' and a.hour<'20161110' where (
(a.start_time<=b.StartTime and a.end_time>=b.StartTime)
or (a.start_time<=b.EndTime and a.end_time>=b.EndTime)
or (a.start_time>=b.StartTime and a.end_time<=b.EndTime)
)
8、導出某張表中的記錄到hdfs:
set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.merge.size.per.task= 1000000000;
set hive.merge.smallfiles.avgsize= 1000000000; use abc_hive_db; insert overwrite directory '/user/dt/dat/1109/' row format delimited fields terminated by '|' select * from tmp_mr_s1_mme1109;
-- 如果此時到出文件沒有合並的情況下:可以使用getmerge來合並文件。
hdfs dfs -getmerge /user/dt/dat/1100/* mergefile.csv
導出文件是指定分列格式:
insert overwrite directory '/user/jr/dt/my_table' row format delimited fields terminated by '|' collection items terminated by ',' map keys terminated by ':' select * from my_table
9、查看表分區字段:
0: jdbc:hive2://xxx/> show partitions default.s1_mme; +------------------------------------+--+ | partition | +------------------------------------+--+ | hour=2016110214 | | hour=2016110215 | | hour=2016110216 | ... +------------------------------------+--+
如果某一個張表中包含多個分區字段例如(default.s1_mme包含兩個分區(p_city,p_day)),那么當我們查看某個城市都有哪些天數據分區的時候就可以執行以下命令:
0: jdbc:hive2://xxx/> show partitions default.s1_mme partition(p_city='wuhan');
加載數據到某個分區下:
load data local inpath '/sd/dataext' into table testPar partition(dt='20180117');
參考信息:
hive partitions相關處理:http://blog.sina.com.cn/s/blog_9f48885501016hn5.html
http://blog.sina.com.cn/s/blog_9f48885501016k5m.html