使用方法如下
set mapred.reduce.tasks = 300; add file /home/work/process.py; insert overwrite directory '/mydir/' select * from( from( select id, name from hive_table_one where name = '張三' )one join ( select id, name from hive_table_two where name = '李四' )two on one.id = two.id reduce one.id, one.name, two.id, two.name using '/home/sharelib/python/bin/python process.py' as id, name )redall
在process.py腳本處理Hive表中的NULL數據時,需要注意一下。
# 判斷name是否為NULL的語句如下 if (name == '\N') #如果是先查詢出結果,存成文本,再進行處理。那么就會是 if (name == 'NULL')
