Hive不同文件的讀取與序列化

本文轉載自查看原文 2017-06-24 18:42 1182

Hive不同文件的讀取對照

stored as textfile

直接查看hdfs
hadoop fs -text

hive> create table test_txt(name string,val string) stored as textfile;

stored as sequencefile

hadoop fs -text

hive> create table test_seq(name string,val string) stored as sequencefile;

stored as rcfile

hive –service rcfilecat path

hive>  create table test_rc(name string,val string) stored as rcfile;

stored as inputformat ‘class’自己定義

outformat ‘class’
基本步驟：
1、編寫自己定義類
2、打成jar包
3、加入jar文件，hive> add jar /***/***/***.jar(當前生效)或者復制到hive安裝文件夾的lib文件夾下。重新啟動client（永久生效）。
4、創建表，指定自己定義的類

Hive使用SerDe

SerDe是”Serializer”和”Deserializer”的簡寫。

Hive使用SerDe（和FileFormat）來讀、寫表的行。
讀寫數據的順序例如以下：

HDFS文件-->InputFileFormat--><key,value>-->Deserializer-->Row對象

Row對象-->Serializer--><key,value＞-->OutputFileFormat-->HDFS文件

Hive自帶的序列化與反序列化

當然我們也能夠自己實現自己定義的序列化與反序列化
Hive自己定義序列化與反序列化步驟
1、實現接口SerDe或者繼承AbstractSerDe抽象類
2、重寫里面的方法

Demo:

創建表

drop table apachelog;
create table apachelog( host string, identity string, user string, time string, request string, status string, size string, referer string, agent string ) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties( "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([0-9]*) ([0-9]*) ([^ ]*) ([^ ]*)" )stored as textfile;

cat serdedata
110.52.250.126 test user -  GET 200 1292 refer agent
27.19.74.143 test root - GET 200 680 refer agent

載入數據

load data local inpath '/liguodong/hivedata/serdedata' overwrite into table apachelog;

查看內容

select * from apachelog;
 select host from apachelog;

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 文件和流（序列化） Hive中自定義序列化器（帶編碼） FormData序列化及file文件上傳什么是序列化,為什么要序列化序列化 — Kryo序列化什么是序列化？序列化有什么作用？ Java中對文件的序列化和反序列化 .NET操作XML文件之泛型集合的序列化與反序列化【Python】python學習文件的序列化和反序列化 HashMap的序列化