從impala中創建kudu表之后,如果想從hive或spark sql直接讀取,會報錯:
Caused by: java.lang.ClassNotFoundException: com.cloudera.kudu.hive.KuduStorageHandler at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.hive.ql.metadata.HiveUtils.getStorageHandler(HiveUtils.java:309)
官方的解釋是:
You will encounter this exception when you try to access a Kudu table using Hive. This is not a case of a missing jar, but simply that Impala stores Kudu metadata in Hive in a format that is unreadable to other tools, including Hive itself. and Spark. Currently, there is no workaround for Hive users. Spark users can work around this by creating temporary tables.
所以不能直接從hive或spark sql讀取impala創建的kudu表,但是spark有個稍微簡單的方法是
spark.read.format("kudu").options(Map("kudu.master" -> kuduMaster, "kudu.table" -> kuduTableName)).load.createOrReplaceTempView("tmp_kudu_table") spark.sql("select * from tmp_kudu_table limit 5")
參考:
https://www.cloudera.com/documentation/enterprise/5-14-x/topics/kudu_troubleshooting.html