【原創】大叔問題定位分享（2）spark任務一定幾率報錯java.lang.NoSuchFieldError: HIVE_MOVE_FILES_THREAD_COUNT

本文轉載自查看原文 2018-11-02 15:01 1044 Hive/ 問題定位/ BigData/ Spark/ 源碼

最近用yarn cluster方式提交spark任務時，有時會報錯，報錯幾率是40%，報錯如下：

18/03/15 21:50:36 116 ERROR ApplicationMaster91: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.NoSuchFieldError: HIVE_MOVE_FILES_THREAD_COUNT;

org.apache.spark.sql.AnalysisException: java.lang.NoSuchFieldError: HIVE_MOVE_FILES_THREAD_COUNT;

         at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)

         at org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)

         at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:374)

         at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:221)

         at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)

         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)

         at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)

         at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)

         at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

         at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)

         at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)

         at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)

         at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)

         at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)

         at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)

         at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)

         at scala.util.control.Breaks.breakable(Breaks.scala:38)

         at app.package.APPClass$.main(APPClass.scala:177)

         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

         at java.lang.reflect.Method.invoke(Method.java:497)

         at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)

Caused by: java.lang.NoSuchFieldError: HIVE_MOVE_FILES_THREAD_COUNT

         at org.apache.hadoop.hive.ql.metadata.Hive.trashFilesUnderDir(Hive.java:1389)

         at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:2873)

         at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1621)

         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

         at java.lang.reflect.Method.invoke(Method.java:497)

         at org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:728)

         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:676)

         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:676)

         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:676)

         at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:279)

         at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:226)

         at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:225)

         at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:268)

         at org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:675)

         at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:768)

         at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:766)

         at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:766)

         at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)

         ... 25 more

大概流程是spark sql在執行InsertIntoHiveTable時會調用loadTable，這個操作最終會通過反射調用hive代碼的loadTable方法

org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:407)
org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:766)
org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:728)
java.lang.reflect.Method.invoke(Method.java:497)
org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1621)
org.apache.hadoop.hive.ql.metadata.Hive.trashFilesUnderDir(Hive.java:1389)

在第6步中報錯 java.lang.NoSuchFieldError: HIVE_MOVE_FILES_THREAD_COUNT

這個問題通常會認為是hive-site.xml缺少配置

<property>

<name>hive.mv.files.thread</name>

<value>15</value>

</property>

但是查看代碼會發現spark2.1.1依賴的是hive1.2.1，在hive1.2.1中是沒有hive.mv.files.thread這個配置的，這個配置從hive2才開始出現，而且報錯的類org.apache.hadoop.hive.ql.metadata.Hive在hive1.2.1和hive2的相關代碼完全不同，具體分寫如下：

在hive1.2.1的代碼是：（trashFilesUnderDir方法是FileUtils類的）

            if (FileUtils.isSubDir(oldPath, destf, fs2)) {

              FileUtils.trashFilesUnderDir(fs2, oldPath, conf);

            }

在hive2的代碼是：（trashFilesUnderDir方法是Hive類的）

  private boolean trashFilesUnderDir(final FileSystem fs, Path f, final Configuration conf)

      throws IOException {

    FileStatus[] statuses = fs.listStatus(f, FileUtils.HIDDEN_FILES_PATH_FILTER);

    boolean result = true;

    final List<Future<Boolean>> futures = new LinkedList<>();

    final ExecutorService pool = conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 25) > 0 ?

        Executors.newFixedThreadPool(conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 25),

        new ThreadFactoryBuilder().setDaemon(true).setNameFormat("Delete-Thread-%d").build()) : null;

所以第6步的報錯，執行的應該是hive2的代碼，所以猜測問題可能是：

1）由於jar包污染，啟動jvm的時候classpath里同時有hive1和hive2的jar，有時加載類用到hive1的jar（可以正常運行），有時用到hive2的jar（會報錯）；

2）集群服務器環境配置差異，有的服務器classpath中沒有hive2的jar（可以正常運行），有的服務器classpath有hive2的jar（可能報錯）；

對比正常和報錯的服務器的環境配置以及啟動命令發現都是一樣的，沒有發現hive2的jar，

通過在啟動任務時增加-verbose:class，發現正常和報錯的情況下，Hive類都是從Hive1.2.1的jar加載出來的，

[Loaded org.apache.hadoop.hive.ql.metadata.Hive from file:/export/Data/tmp/hadoop-tmp/nm-local-dir/filecache/98/hive-exec-1.2.1.spark2.jar]

否定了上邊的兩種猜測；

分析提交任務命令發現，用到了spark.yarn.jars，避免每次都上傳spark的jar，這些jar會被作為filecache緩存在yarn.nodemanager.local-dirs下，

反編譯正常和報錯服務器上filecache里的hive-exec-1.2.1.spark2.jar最終發現問題，

正常服務器上Hive類代碼是：

    if (FileUtils.isSubDir(oldPath, destf, fs2))

        FileUtils.trashFilesUnderDir(fs2, oldPath, conf);

報錯服務器上的Hive類代碼是：

    private static boolean trashFilesUnderDir(final FileSystem fs, Path f, final Configuration conf) throws IOException {

        FileStatus[] statuses = fs.listStatus(f, FileUtils.HIDDEN_FILES_PATH_FILTER);

        boolean result = true;

        List<Future<Boolean>> futures = new LinkedList();

        ExecutorService pool = conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 25) > 0 ? Executors.newFixedThreadPool(conf.getInt(ConfVars.HIVE_MOVE_FILES_THREAD_COUNT.varname, 25), (new ThreadFactoryBuilder()).setDaemon(true).setNameFormat("Delete-Thread-%d").build()) : null;

報錯服務器上的Hive類用到ConfVars.HIVE_MOVE_FILES_THREAD_COUNT，但是在hive-common-1.2.1.jar中的ConfVars不存在這個屬性，所以報錯java.lang.NoSuchFieldError，

所以問題應該是hdfs上的hive-exec-1.2.1.spark2.jar一開始是對的，然后所有nodemanager下載到本地作為filecache，后來這個jar被改錯了（使用hive2編譯spark），然后新加的nodemanager會下載有問題的jar作為filecache，這樣結果就是有的服務器執行正常，有的服務器執行報錯；

yarn中的filecache清理有兩個配置

yarn.nodemanager.localizer.cache.cleanup.interval-ms：600000 Interval in between cache cleanups.

yarn.nodemanager.localizer.cache.target-size-mb：10240 Target size of localizer cache in MB, per local directory.

每隔cleanup.interval-ms會檢查本地filecache大小是否超過target-size-mb，超過才清理，不超過就一直使用filecache；

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 spark hive java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT jenkins執行構建任務報錯之java.lang.NoSuchFieldError: DEFAULT_USER_SETTINGS_FILE java.lang.NoSuchFieldError: INSTANCE java.lang.NoSuchFieldError報錯解決方案【原創】大叔問題定位分享（16）spark寫數據到hive外部表報錯ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat 【原創】大叔經驗分享（33）hive select count為0 java運行報錯:nested exception is java.lang.NoSuchFieldError: INSTANCE,但使用@Test測試是好的【原創】大叔問題定位分享（15）spark寫parquet數據報錯ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead 【原創】大叔問題定位分享（20）hdfs文件create寫入正常，append寫入報錯【原創】大叔經驗分享（78）hive查詢報錯NoViableAltException