hive UDTF函數

本文轉載自查看原文 2016-05-13 11:23 5235 HIVE/ UDTF/ Hadoop

之前說過HIVE，UDF(User-Defined-Function)函數的編寫和使用，現在來看看UDTF的編寫和使用。

1. UDTF介紹

UDTF(User-Defined Table-Generating Functions) 用來解決輸入一行輸出多行(On-to-many maping) 的需求。

2. 編寫自己需要的UDTF

繼承org.apache.hadoop.hive.ql.udf.generic.GenericUDTF,實現initialize, process, close三個方法。

UDTF首先會調用initialize方法，此方法返回UDTF的返回行的信息（返回個數，類型）。

初始化完成后，會調用process方法,真正的處理過程在process函數中，在process中，每一次forward()調用產生一行；如果產生多列可以將多個列的值放在一個數組中，然后將該數組傳入到forward()函數。

最后close()方法調用，對需要清理的方法進行清理。

下面是我寫的一個用來切分”key:value;key:value;”這種字符串，返回結果為key, value兩個字段。供參考：

package com.hadoop.hive.udtf;

import java.util.ArrayList;

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;

public class UDTFExplode extends GenericUDTF {

    @Override
    public void close() throws HiveException {
        // TODO Auto-generated method stub

    }

    @Override
    public void process(Object[] args) throws HiveException {
        // TODO Auto-generated method stub
        String input = args[0].toString();
        String[] test = input.split(";");
        for (int i = 0; i < test.length; i++) {
            try {
                String[] result = test[i].split(":");
                forward(result);
            } catch (Exception e) {
                continue;
            }
        }

    }

    @Override
    public StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException {
        if (args.length != 1) {
            throw new UDFArgumentLengthException("ExplodeMap takes only one argument");
        }
        if (args[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
            throw new UDFArgumentException("ExplodeMap takes string as a parameter");
        }

        ArrayList<String> fieldNames = new ArrayList<String>();
        ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
        fieldNames.add("col1");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
        fieldNames.add("col2");
        fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);

        return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
    }

}

3.使用方法

將程序打成JAR包，然后上傳服務器。添加UDF函數：

UDTF有兩種使用方法，一種直接放到select后面，一種和lateral view一起使用。

1：直接select中使用

select split_test('asd:123\;rtrt:3445\;vbvx:6787') as (col1,col2) from finance.dual;

需要注意的是UDTF不可以添加其他字段使用，不可以嵌套調用，不可以和group by/cluster by/distribute by/sort by一起使用

2：和lateral view一起使用

select '1', mytable.col1, mytable.col2 from finance.dual lateral view split_test('asd:123\;rtrt:3445\;vbvx:6787') as (col1,col2) mytable as col1, col2;

執行過程相當於單獨執行了兩次抽取，然后union到一個表里。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive 自定義函數 UDF UDAF UDTF hive 中自定義UDF函數和自定義UDTF函數步驟 Hive UDTF開發指南 hive自定義函數UDTF、UDF的maven pom.xml文件 Hive 10、Hive的UDF、UDAF、UDTF hive中UDTF編寫和使用(轉) UDTF 【Hive五】Hive函數UDF Hive 函數（六） hive函數之~reflect函數