這個大坑... ....
如題,在Windows的eclipse中編寫SparkSQL代碼時,編寫如下代碼時,一運行就拋出一堆空指針異常:
// 首先還是創建SparkConf SparkConf conf = new SparkConf() .setMaster("local") .setAppName("HiveDataSource"); // 創建JavaSparkContext JavaSparkContext sc = new JavaSparkContext(conf); SQLContext sqlContext=new SQLContext(sc); // DataFrame usersDF=sqlContext.read().parquet("hdfs://spark2:9000/francis/spark-core/users.parquet"); DataFrame usersDF=sqlContext.read().parquet("users.parquet");
這個糾結啊... ...。
后來將數據保存到hdfs上可以運行。於是我誤以為不能再本地保存,后來google了一下,看很多demo都是將數據保存到本地的parquet中,於是這個猜測否決了。
后來在這里找到了答案:http://stackoverflow.com/questions/25505365/parquet-file-in-spark-sql
其回復如下:
Spark is compatible with Windows. You can run your program in a spark-shell session in Windows or you can run it using spark-submit with necessary argument such as "-master" (again, in Windows or other OS). You cannot just run your Spark program as an ordinary Java program in Eclispe without properly setting up the Spark environment and so on. You problem has nothing to do with Windows. |
后來又在linux 上的spark-shell上驗證了一下,可以本地保存的!!!!
所以啊,要想保存在本地,還是使用spark-submit吧,不要直接在eclipse中運行了。