spark2.0以上 RDD 轉 dataframe 及數據處理 ERROR Executor:91 - Exception in task 1.0 in stage 0.0 (TID 1) java.lang.NumberFormatException: empty String


1、配置文件

package config
import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}
case object conf {
   private val master = "local[*]"
   val confs: SparkConf = new SparkConf().setMaster(master).setAppName("jobs")
//   val confs: SparkConf = new SparkConf().setMaster("http://laptop-2up1s8pr:4040/").setAppName("jobs")
   val sc = new SparkContext(confs)
   sc.setLogLevel("ERROR")
   val spark_session: SparkSession = SparkSession.builder()
    .appName("jobs").config(confs).getOrCreate()

//   設置支持笛卡爾積 對於spark2.0來說
   spark_session.conf.set("spark.sql.crossJoin.enabled",true)
}

  

2、讀取RDD及轉換dataframe,spark2.0  dataframe保存CSV文件方法 

package sparkDataMange
import config.conf.{sc,spark_session}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{DataFrame, Row, SaveMode}
import config.conf.spark_session.implicits._

object irisDataMange {

  def main(args: Array[String]): Unit = {
    val path:String = "data/iris.data"
    val irisData: RDD[String] = sc.textFile(path)

//    case class irsModel(ft1:String,ft2:String,ft3:String,ft4:String,label:String)

    val rdd1: RDD[Array[String]] = irisData.map(lines => {lines.split(",")})
    val df: RDD[(Double, Double, Double, Double, Double)] = rdd1.map(line => {

      (line(0).toDouble, line(1).toDouble, line(2).toDouble, line(3).toDouble,
        if (line(4) == "Iris-setosa") {
          1D
        }
        else if (line(4) == "Iris-versicolor") {
          2D
        }
        else {
          3D
        })
    })
    val df1: DataFrame = df.toDF("ft1","ft2","ft3","ft4","label")

    println(df1.count())

    //創建臨時表
    df1.createOrReplaceTempView("iris")
    spark_session.sql("select * from iris").show(150)

    //保存csv
    df1.coalesce(1).write.format("csv").save("data/irsdf")
    sc.stop()
  }
}

  

3、報錯注意:

  ERROR Executor:91 - Exception in task 1.0 in stage 0.0 (TID 1) java.lang.NumberFormatException: empty String

  把多余的回車去掉,只保留標准的CSV數據格式,否則在處理轉dataframe的時候出問題。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM