SparkSQL訪問Hive遇到的問題及解決方法


需要先將hadoop的core-site.xml,hive的hive-site.xml拷貝到project中
測試代碼
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession
.builder()
.appName("TopNApp")
.master("local[2]")
.enableHiveSupport()
.getOrCreate()
val userClickDF = spark.table("user_click")
userClickDF.show(10)
}
 
報錯
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:869)
at homework0522.OverwriteTopN$.main(OverwriteTopN.scala:12)
at homework0522.OverwriteTopN.main(OverwriteTopN.scala)
 
查看源碼
"SparkSession.scala"
/**
* Enables Hive support, including connectivity to a persistent Hive metastore, support for
* Hive serdes, and Hive user-defined functions.
*
* @since 2.0.0
*/

def enableHiveSupport(): Builder = synchronized {
"在這里進行if判斷的時候找不到hive class"
if (hiveClassesArePresent) {
config(CATALOG_IMPLEMENTATION.key, "hive")
} else {
throw new IllegalArgumentException(
"Unable to instantiate SparkSession with Hive support because " +
"Hive classes are not found.")
}
}

/**
* @return true if Hive classes can be loaded, otherwise false.
*/
private[spark] def hiveClassesArePresent: Boolean = {
try {
"這里通過Class.forName去找下面的兩個類,第一個類的時候就找不到了"
Utils.classForName(HIVE_SESSION_STATE_BUILDER_CLASS_NAME)
Utils.classForName("org.apache.hadoop.hive.conf.HiveConf")
true
} catch {
case _: ClassNotFoundException | _: NoClassDefFoundError => false
}
}

"發現找不到HiveSessionStateBuilder"
private val HIVE_SESSION_STATE_BUILDER_CLASS_NAME =
"org.apache.spark.sql.hive.HiveSessionStateBuilder"
 
解決方法
將$HIVE_HOME/lib下的spark-hive_2.11-2.4.2.jar與spark-hive-thriftserver_2.11-2.4.2.jar添加到project中

繼續報錯
Exception in thread "main" java.lang.NoSuchFieldError: METASTORE_CLIENT_SOCKET_LIFETIME
at org.apache.spark.sql.hive.HiveUtils$.formatTimeVarsForHiveClient(HiveUtils.scala:194)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:285)
at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
 
查看源碼
"HiveUtils.scala"
/**
* Change time configurations needed to create a [[HiveClient]] into unified [[Long]] format.
*/
private[hive] def formatTimeVarsForHiveClient(hadoopConf: Configuration): Map[String, String] = {
// Hive 0.14.0 introduces timeout operations in HiveConf, and changes default values of a bunch
// of time `ConfVar`s by adding time suffixes (`s`, `ms`, and `d` etc.). This breaks backwards-
// compatibility when users are trying to connecting to a Hive metastore of lower version,
// because these options are expected to be integral values in lower versions of Hive.
//
// Here we enumerate all time `ConfVar`s and convert their values to numeric strings according
// to their output time units.
Seq(
ConfVars.METASTORE_CLIENT_CONNECT_RETRY_DELAY -> TimeUnit.SECONDS,
ConfVars.METASTORE_CLIENT_SOCKET_TIMEOUT -> TimeUnit.SECONDS,
"在這里讀不到值"
ConfVars.METASTORE_CLIENT_SOCKET_LIFETIME -> TimeUnit.SECONDS,
...
).map { case (confVar, unit) =>
confVar.varname -> HiveConf.getTimeVar(hadoopConf, confVar, unit).toString
}.toMap
}
 
進入ConfVars
"HiveConf.java"
public static enum ConfVars {
SCRIPTWRAPPER("hive.exec.script.wrapper", (Object)null, ""),
PLAN("hive.exec.plan", "", ""),
...
}
 
發現ConfVars中定義的變量並沒有METASTORE_CLIENT_SOCKET_LIFETIME,而HiveConf.java來自於hive-exec-1.1.0-cdh5.7.0.jar,即證明hive1.1.0中並沒有假如該參數。

解決方法
將hive依賴換為1.2.1

<properties>
...
<!-- <hive.version>1.1.0-cdh5.7.0</hive.version> -->
<hive.version>1.2.1</hive.version>
</properties>

...
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>${hive.version}</version>
</dependency>
 
繼續報錯
Exception in thread "main" org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
Caused by: java.lang.reflect.InvocationTargetException
Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused: connect
Caused by: java.net.ConnectException: Connection refused: connect
 
解決方法
這是因為遠端沒有啟動hive造成的,啟動hive時需要配置metastore。

$HIVE_HOME/bin/hive --service metastore &

————————————————
版權聲明:本文為CSDN博主「小朋友2D」的原創文章,遵循CC 4.0 BY-SA版權協議,轉載請附上原文出處鏈接及本聲明。
原文鏈接:https://blog.csdn.net/ct2020129/article/details/90695033


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM