Hive的數據存儲格式

本文轉載自查看原文 2018-08-14 13:13 1973 Hive

1.默認存儲格式為：純文本

　　stored as textfile;

2.二進制存儲的格式

　　順序文件，avro文件，parquet文件，rcfile文件，orcfile文件。

3.轉存parquet格式

　　hive>create table hive.stocks_parquet stored as parquet as select * from stocks;

　　　說明：原始數據大小為stocks表[40萬條]，21M，轉存parquet格式后，hdfs上數據文件大小為6M，壓縮比在3倍左右；

4.轉存rcfile

　　hive> create table hive.stocks_rcfile stored as rcfile as select * from stocks ;

　　　　說明：原始數據大小為stocks表[40萬條]，21M，轉存rcfile格式后，hdfs上數據文件大小為16M，壓縮比在0.7倍左右；

5.轉存orcfile

　　hive> create table hive.stocks_orcfile stored as orcfile as select * from stocks ;

　　　　說明：原始數據大小為stocks表[40萬條]，21M，轉存orcfile格式后，hdfs上數據文件大小為5M，壓縮比在4倍左右；

6.測試執行時間
　　hive>select count(*) from stocks ;
　　　　執行時間：exec/fetch time: 0.227/1.580 sec
　　hive>select count(*) from hive.stocks_parquet ;
　　　　執行時間：exec/fetch time: 0.144/2.846 sec
　　hive>select count(*) from hive.stocks_rcfile ;
　　　　執行時間：exec/fetch time: 0.114/1.238 sec
　　hive>select count(*) from hive.stocks_orcfile ;
　　　　執行時間：exec/fetch time: 0.129/2.027 sec

UDF自定義函數
　　1.首先創建JAVA類，繼承UDF.class
　　2.重寫evaluate()方法；
　　3.打jar包；
　　4.加載自定義函數的jar包;
　　　　hive>add jar /home/hyxy/XXX.jar ;
　　　　hive>create temporary function {function_name} as 'com.hyxy.hive.udf.xxx'

　　5.自定義函數類型
　　　　a.UDF:單行進-->單行出
　　　　b.UDAF：多行進-->單行出
　　　　c.UDTF：單行進-->多行出

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 hive支持的數據類型和存儲格式 Hive數據類型與文件存儲格式 Hive文件存儲格式和hive數據壓縮 Hive學習之路（六）Hive SQL之數據類型和存儲格式 Hive 文件存儲格式 HIVE存儲格式詳解 Hive的文件存儲格式 Hive存儲格式 Hive文件的存儲格式 hive文件存儲格式