原博文出自於:http://blog.csdn.net/longzilong216/article/details/23921235(暫時) 感謝!
自己寫代碼時候的利用到的模板
UDF步驟:
1.必須繼承org.apache.hadoop.hive.ql.exec.UDF2.必須實現evaluate函數,evaluate函數支持重載
- <span style="font-size: x-small;">package com.alibaba.hive.udf;
- import org.apache.hadoop.hive.ql.exec.UDF
- public class helloword extends UDF{
- public String evaluate(){
- return "hello world!";
- }
- public String evaluate(String str){
- return "hello world: " + str;
- }
- }</span>
UDAF步驟:
1.必須繼承org.apache.hadoop.hive.ql.exec.UDAF(函數類繼承)org.apache.hadoop.hive.ql.exec.UDAFEvaluator(內部類Evaluator實現UDAFEvaluator接口)2.Evaluator需要實現 init、iterate、terminatePartial、merge、terminate這幾個函數init():類似於構造函數,用於UDAF的初始化iterate():接收傳入的參數,並進行內部的輪轉。其返回類型為booleanterminatePartial():無參數,其為iterate函數輪轉結束后,返回亂轉數據,iterate和terminatePartial類似於hadoop的Combiner(iterate--mapper;terminatePartial--reducer)merge():接收terminatePartial的返回結果,進行數據merge操作,其返回類型為booleanterminate():返回最終的聚集函數結果
- <span style="font-size: x-small;">package com.alibaba.hive;
- import org.apache.hadoop.hive.ql.exec.UDAF;
- import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
- public class myAVG extends UDAF{
- public static class avgScore{
- private long pSum;
- private double pCount;
- }
- public static class AvgEvaluator extends UDAFEvaluator{
- avgScore score;
- public AvgEvaluator(){
- score = new avgScore();
- init();
- }
- /*
- *init函數類似於構造函數,用於UDAF的初始化
- */
- public void init(){
- score.pSum = 0;
- score.pCount = 0;
- }
- /*
- *iterate接收傳入的參數,並進行內部的輪轉。其返回類型為boolean
- *類似Combiner中的mapper
- */
- public boolean iterate(Double in){
- if(in != null){
- score.pSum += in;
- score.pCount ++;
- }
- return true;
- }
- /*
- *terminatePartial無參數,其為iterate函數輪轉結束后,返回輪轉數據
- *類似Combiner中的reducer
- */
- public avgScore terminatePartial(){
- return score.pCount == 0 ? null : score;
- }
- /*
- *merge接收terminatePartial的返回結果,進行數據merge操作,其返回類型為boolean
- */
- public boolean merge(avgScore in){
- if(in != null){
- score.pSum += in.pSum;
- score.pCount += in.pCount;
- }
- return true;
- }
- /*
- *terminate返回最終的聚集函數結果
- */
- public Double terminate(){
- return score.pCount == 0 ? null : Double.valueof(score.pSum/score.pCount);
- }
- }
- }</span>
UDTF步驟:
1.必須繼承org.apache.hadoop.hive.ql.udf.generic.GenericUDTF
2.實現initialize, process, close三個方法
3.UDTF首先會
a.調用initialize方法,此方法返回UDTF的返回行的信息(返回個數,類型)
b.初始化完成后,會調用process方法,對傳入的參數進行處理,可以通過forword()方法把結果返回
c.最后close()方法調用,對需要清理的方法進行清理
2.實現initialize, process, close三個方法
3.UDTF首先會
a.調用initialize方法,此方法返回UDTF的返回行的信息(返回個數,類型)
b.初始化完成后,會調用process方法,對傳入的參數進行處理,可以通過forword()方法把結果返回
c.最后close()方法調用,對需要清理的方法進行清理
- <span style="font-size: x-small;"><span style="font-size: xx-small;">public class GenericUDTFExplode extends GenericUDTF {
- private ListObjectInspector listOI = null;
- @Override
- public void close() throws HiveException {
- }
- @Override
- public StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {
- if (args.length != 1) {
- throw new UDFArgumentException("explode() takes only one argument");
- }
- if (args[0].getCategory() != ObjectInspector.Category.LIST) {
- throw new UDFArgumentException("explode() takes an array as a parameter");
- }
- listOI = (ListObjectInspector) args[0];
- ArrayList<String> fieldNames = new ArrayList<String>();
- ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
- fieldNames.add("col");
- fieldOIs.add(listOI.getListElementObjectInspector());
- return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,
- fieldOIs);
- }
- private final Object[] forwardObj = new Object[1];
- @Override
- public void process(Object[] o) throws HiveException {
- List<?> list = listOI.getList(o[0]);
- if(list == null) {
- return;
- }
- for (Object r : list) {
- forwardObj[0] = r;
- forward(forwardObj);
- }
- }
- @Override
- public String toString() {
- return "explode";
- }
- }</span></span>
