hivesql 遷移spark3.0 sparksql報錯如Cannot safely cast '字段':StringType to IntegerType的問題

本文轉載自查看原文 2019-12-25 19:58 1365

一問題

hivesql可以正常運行，spark3.0運行報錯如圖

spark3.0配置查看源碼新增一個

  val STORE_ASSIGNMENT_POLICY =
    buildConf("spark.sql.storeAssignmentPolicy")
      .doc("When inserting a value into a column with different data type, Spark will perform " +
        "type coercion. Currently, we support 3 policies for the type coercion rules: ANSI, " +
        "legacy and strict. With ANSI policy, Spark performs the type coercion as per ANSI SQL. " +
        "In practice, the behavior is mostly the same as PostgreSQL. " +
        "It disallows certain unreasonable type conversions such as converting " +
        "`string` to `int` or `double` to `boolean`. " +
        "With legacy policy, Spark allows the type coercion as long as it is a valid `Cast`, " +
        "which is very loose. e.g. converting `string` to `int` or `double` to `boolean` is " +
        "allowed. It is also the only behavior in Spark 2.x and it is compatible with Hive. " +
        "With strict policy, Spark doesn't allow any possible precision loss or data truncation " +
        "in type coercion, e.g. converting `double` to `int` or `decimal` to `double` is " +
        "not allowed."
      )
      .stringConf
      .transform(_.toUpperCase(Locale.ROOT))
      .checkValues(StoreAssignmentPolicy.values.map(_.toString))
      .createWithDefault(StoreAssignmentPolicy.ANSI.toString)

看下配置有三種類型

  object StoreAssignmentPolicy extends Enumeration {
    val ANSI, LEGACY, STRICT = Value
  }

對於ANSI策略，Spark根據ANSI SQL執行類型強制。這種行為基本上與PostgreSQL相同

它不允許某些不合理的類型轉換，如轉換“`string`to`int`或`double` to`boolean`

對於LEGACY策略 Spark允許類型強制，只要它是有效的'Cast' 這也是Spark 2.x中的唯一行為，它與Hive兼容。

對於STRICT策略 Spark不允許任何可能的精度損失或數據截斷

所以我們增加配置

spark.sql.storeAssignmentPolicy=LEGACY

之后能正常運行

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Spark3.0搭建 Spark3.0 Standalone模式部署 Spark3.0中的AQE和DPP Spark3.0分布，Structured Streaming UI登場 hivesql遷移spark2.4 cannot resolve '(ctime >= start_time) due to data type mismatch: differing types in '(ctime` >= `start_time`)'(timestamp and bigint).; line 99 pos 10 Spark3.0 preview預覽版嘗試GPU調用（本地模式不支持GPU）【西天取經】（Spark入門）Windows10 安裝 Spark3.0，使用.net創建第一個Spark程序 hiveSql 遷移spark2.4時報錯Error in query: Window function row_number() requires window to be ordered, please add ORDER BY clause SparkSQL On Hive和spark的內存分配問題【原創】大叔問題定位分享（16）spark寫數據到hive外部表報錯ClassCastException: org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.HiveOutputFormat