spark scala讀取csv文件


將以下內容保存為small_zipcode.csv

id,zipcode,type,city,state,population
1,704,STANDARD,,PR,30100
2,704,,PASEO COSTA DEL SUR,PR,
3,709,,BDA SAN LUIS,PR,3700
4,76166,UNIQUE,CINGULAR WIRELESS,TX,84000
5,76177,STANDARD,,TX,
,,,,,
7,76179,STANDARD,,TX,

打開spark-shell交互式命令行

val filePath="small_zipcode.csv"
val df=spark.read.options(
  Map("inferSchema"->"true","delimiter"->",","header"->"true")).csv(filePath)

scala> df.show
+----+-------+--------+-------------------+-----+----------+
|  id|zipcode|    type|               city|state|population|
+----+-------+--------+-------------------+-----+----------+
|   1|    704|STANDARD|               null|   PR|     30100|
|   2|    704|    null|PASEO COSTA DEL SUR|   PR|      null|
|   3|    709|    null|       BDA SAN LUIS|   PR|      3700|
|   4|  76166|  UNIQUE|  CINGULAR WIRELESS|   TX|     84000|
|   5|  76177|STANDARD|               null|   TX|      null|
|null|   null|    null|               null| null|      null|
|   7|  76179|STANDARD|               null|   TX|      null|
+----+-------+--------+-------------------+-----+----------+

scala> df.na.drop("all").show()
+---+-------+--------+-------------------+-----+----------+
| id|zipcode|    type|               city|state|population|
+---+-------+--------+-------------------+-----+----------+
|  1|    704|STANDARD|               null|   PR|     30100|
|  2|    704|    null|PASEO COSTA DEL SUR|   PR|      null|
|  3|    709|    null|       BDA SAN LUIS|   PR|      3700|
|  4|  76166|  UNIQUE|  CINGULAR WIRELESS|   TX|     84000|
|  5|  76177|STANDARD|               null|   TX|      null|
|  7|  76179|STANDARD|               null|   TX|      null|
+---+-------+--------+-------------------+-----+----------+


scala> df.na.drop().show()
+---+-------+------+-----------------+-----+----------+
| id|zipcode|  type|             city|state|population|
+---+-------+------+-----------------+-----+----------+
|  4|  76166|UNIQUE|CINGULAR WIRELESS|   TX|     84000|
+---+-------+------+-----------------+-----+----------+
參考:
N多spark使用示例:https://sparkbyexamples.com/spark/spark-dataframe-drop-rows-with-null-values/


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM