HIVE: Transform應用實例


數據文件內容

steven:100;steven:90;steven:99^567^22
ray:90;ray:98^456^30
Tom:81^222^33

期望最終放到數據庫的數據格式如下:

steven    100    567     22
steven    90      567     22
steven    99      567     22
ray       90      456    30
ray       98      456    30
Tom       81      222    33

Specifically, if you want to return a different number of columns, or a different number of rows for a given input row, then yu need to perform what hive calls a transform.

 

1.創建表存儲原始數據

create table u_data(col1 string, code int, age int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '^' STORED AS TEXTFILE;

2.加載數據

load data local inpath '/home/stevenxia/data1' overwrite into table u_data;

3.編寫transform腳本

#!/usr/bin/python
import sys
for line in sys.stdin:
 values = line.split()
 tmp = values[0]
 key_values = tmp.split(";")
 for kv in key_values:
  k = kv.split(":")[0]
  v = kv.split(":")[1]
  print '\t'.join([k,v,values[1],values[2]])

4.把腳本部署到node節點, 位置 /home/stevenxia/u.py

5.這樣hive就可以使用了

select transform(u.col1, u.code, u.age) using '/home/stevenxia/u.py' as (col1, col2, col3, col4) from (select * from u_data) as u;

運行結果

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM