Impala的UDF有兩種:
Native Imapal UDF:使用C++開發的,性能極高,官方性能測試比第二種高出將近10倍
Hive的UDF:是Hive中的UDF,直接加載到Impala中,優點是不需要任何改動,完全跟Hive中用法相同
第一種方式請參考我轉載的文章【轉】Impala安裝json解析udf插件
本文介紹第二種方式,在Impala中直接加載Hive的UDF
如在Hive中有一個UDF為get_json_object,用於解析Json,但是Imapla中沒有類似的函數。
1. 該function所在的jar包是/usr/lib/hive/lib/hive-exec-1.1.0-cdh5.13.0.jar,
[cloudera@quickstart lib]$ jar tf hive-exec-1.1.0-cdh5.13.0.jar|grep UDFJson
org/apache/hadoop/hive/ql/udf/UDFJson$AddingList.class
org/apache/hadoop/hive/ql/udf/UDFJson.class
org/apache/hadoop/hive/ql/udf/UDFJson$HashCache.class
org/apache/hadoop/hive/ql/udf/UDFJson$1.class
2.把jar包上傳到hdfs的目錄中,如下:
hdfs dfs -put /usr/lib/hive/lib/hive-exec-1.1.0-cdh5.13.0.jar /user/cloudera/lib/hive-udf.jar
3.在Impala Shell中創建function,其中Symbol指向類名稱:https://github.com/apache/hive/blob/branch-0.14/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFJson.java
create function if not exists get_json_object(String,String) returns String location "/user/cloudera/lib/hive-udf.jar" SYMBOL="org.apache.hadoop.hive.ql.udf.UDFJson";
[quickstart.cloudera:21000] > show functions; Query: show functions +-------------+---------------------------------+-------------+---------------+ | return type | signature | binary type | is persistent | +-------------+---------------------------------+-------------+---------------+ | STRING | get_json_object(STRING, STRING) | JAVA | false | +-------------+---------------------------------+-------------+---------------+
4.在Impala shell中使用
[quickstart.cloudera:21000] > select get_json_object(test1.content,'$.userId') from test1; Query: select get_json_object(test1.content,'$.userId') from test1 Query submitted at: 2018-06-28 04:19:44 (Coordinator: http://quickstart.cloudera:25000) Query progress can be monitored at: http://quickstart.cloudera:25000/query_plan?query_id=4241f9deab0498e2:ab9c00fd00000000 +--------------------------------------------------------------------+ | get_json_object(report_data.content, '$.userid') | +--------------------------------------------------------------------+ | 16 | | 15 | | 8 | +--------------------------------------------------------------------+
驗證可以使用