如果你的函數讀和返回都是基礎數據類型(Hadoop&Hive 基本writable類型,如Text,IntWritable,LongWriable,DoubleWritable等等),那么簡單的API(org.apache.hadoop.hive.ql.exec.UDF)可以勝任
但是,如果你想寫一個UDF用來操作內嵌數據結構,如Map,List和Set,那么你要去熟悉org.apache.hadoop.hive.ql.udf.generic.GenericUDF這個API
簡單API: org.apache.hadoop.hive.ql.exec.UDF
復雜API: org.apache.hadoop.hive.ql.udf.generic.GenericUDF
復雜API: org.apache.hadoop.hive.ql.udf.generic.GenericUDF
接下來我將通過一個示例為上述兩個API建立UDF,我將為接下來的示例提供代碼與測試 。
注://
事實上UDF有一個bug,不會去檢查null參數,null在大數據集當中是非常常見的,所以要嚴謹點。作為回應,這邊加了一個null的檢查
pom文件參考:
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>2.1.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.1</version>
</dependency>
<!-- <dependency>-->
<!-- <groupId>com.aliyun.odps</groupId>-->
<!-- <artifactId>odps-sdk-udf</artifactId>-->
<!-- <version>0.29.10-public</version>-->
<!-- </dependency>-->
</dependencies>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.0.0</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
</artifactSet>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>6</source>
<target>6</target>
</configuration>
</plugin>
</plugins>
</build>
DEMO:
package udf;
import jodd.util.URLDecoder;
import org.apache.hadoop.hive.ql.exec.UDF;
import java.io.UnsupportedEncodingException;
public class TestDecodeX extends UDF {
public static void decodeX (String s) throws UnsupportedEncodingException {
String s1 = s.replaceAll("\\\\x", "%");
String decode = URLDecoder.decode(s1, "utf-8");
System.out.println(decode);
}
public String evaluate(String input) throws Exception {
//事實上UDF有一個bug,不會去檢查null參數,null在大數據集當中是非常常見的,所以要嚴謹點。作為回應,這邊加了一個null的檢查
if (input == null) return null ;
String decode = null ;
try {
String s1 = input.replaceAll("\\\\x", "%");
decode = URLDecoder.decode(s1, "utf-8");
// System.out.println(decode);
} catch (Exception e) {
// e.printStackTrace();
}
System.out.println(decode);
return decode ;
}
public static void main(String[] args) throws Exception {
String s1 = "G977N%7C7.1.2%7Cwifi%7C%7Cgamepubgoogle%7CGetHashed%7Ccom.gamepub.ft2.g%7Candroid%7C%7C%7C1.0.2%7Csamsung%7C1547548%7C1%7CAsia%2FSeoul%7CARM%7C%7C19d1b5cdf01341e99c670f254765148d%22%5D" ;
String s = "172.31.35.210|21/04/2021:10:59:01|[\\x22TakeSample|0bb9f14b1041a8d9|32550283-4DF6-4CC5-9922-E4F9CFAFD7FD|iPhone13,1|14.2.1|wifi||gamepubappstore|GetHashed|com.gamepub.fr2|ios|BAB3A467-A4D0-4900-80F7-BCB9D53757B1||0.26.87|\\xE8\\x8B\\xB9\\xE6\\x9E\\x9C|3.63|0|Asia/Seoul|ARM64||\\x22]\n" ;
TestDecodeX t = new TestDecodeX() ;
t.evaluate(s1) ;
}
}
result結果示例:
G977N|7.1.2|wifi||gamepubgoogle|GetHashed|com.gamepub.ft2.g|android|||1.0.2|samsung|1547548|1|Asia/Seoul|ARM||19d1b5cdf01341e99c670f254765148d"]
Process finished with exit code 0
在hive客戶端:
hive> ADD JAR target/hive-extensions-1.0-SNAPSHOT-jar-with-dependencies.jar; hive> CREATE TEMPORARY FUNCTION decodeX as 'udf.TestDecodeX';
參考:
Hive UDF開發指南