Hive不同文件的讀取對照
stored as textfile
直接查看hdfs
hadoop fs -text
hive> create table test_txt(name string,val string) stored as textfile;
stored as sequencefile
hadoop fs -text
hive> create table test_seq(name string,val string) stored as sequencefile;
stored as rcfile
hive –service rcfilecat path
hive> create table test_rc(name string,val string) stored as rcfile;
stored as inputformat ‘class’自己定義
outformat ‘class’
基本步驟:
1、編寫自己定義類
2、打成jar包
3、加入jar文件,hive> add jar /***/***/***.jar
(當前生效)或者復制到hive安裝文件夾的lib文件夾下。重新啟動client(永久生效)。
4、創建表,指定自己定義的類
Hive使用SerDe
SerDe是”Serializer”和”Deserializer”的簡寫。
Hive使用SerDe(和FileFormat)來讀、寫表的行。
讀寫數據的順序例如以下:
HDFS文件-->InputFileFormat--><key,value>-->Deserializer-->Row對象
Row對象-->Serializer--><key,value>-->OutputFileFormat-->HDFS文件
Hive自帶的序列化與反序列化
當然我們也能夠自己實現自己定義的序列化與反序列化
Hive自己定義序列化與反序列化步驟
1、實現接口SerDe或者繼承AbstractSerDe抽象類
2、重寫里面的方法
Demo:
創建表
drop table apachelog;
create table apachelog( host string, identity string, user string, time string, request string, status string, size string, referer string, agent string ) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties( "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([0-9]*) ([0-9]*) ([^ ]*) ([^ ]*)" )stored as textfile;
cat serdedata
110.52.250.126 test user - GET 200 1292 refer agent
27.19.74.143 test root - GET 200 680 refer agent
載入數據
load data local inpath '/liguodong/hivedata/serdedata' overwrite into table apachelog;
查看內容
select * from apachelog;
select host from apachelog;