load、save方法的用法
DataFrame usersDF = sqlContext.read().load(
"
hdfs://spark1:9000/users.parquet
");
usersDF. select( " name ", " favorite_color ").write()
.save( " hdfs://spark1:9000/namesAndFavColors.parquet ");
//load、save方法~指定文件格式
DataFrame peopleDF = sqlContext.read().format( " json ")
.load( " hdfs://spark1:9000/people.json ");
peopleDF. select( " name ").write().format( " parquet ")
usersDF. select( " name ", " favorite_color ").write()
.save( " hdfs://spark1:9000/namesAndFavColors.parquet ");
//load、save方法~指定文件格式
DataFrame peopleDF = sqlContext.read().format( " json ")
.load( " hdfs://spark1:9000/people.json ");
peopleDF. select( " name ").write().format( " parquet ")
.save("hdfs://spark1:9000/peopleName_java");
parquet數據源:
-》加載parquet數據
DataFrame usersDF = sqlContext.read().parquet("hdfs://spark1:9000/spark-study/users.parquet");
-》parquet分區自動推斷
將只有兩個字段的user.parquet存到 /users/gender=male/country=us/ 目錄下(如下),
使用如下代碼加載users.parquet的數據后,得到的usersDF中將會有4個字段
DataFrame usersDF = sqlContext.read().parquet("hdfs://spark1:9000/spark-study/users/gender=male/country=us/users.parquet");
其中gender字段的值為male,country的值為us
-》合並元數據
parquet合並元數據: http://www.cnblogs.com/key1309/p/5332089.html
json數據源:
DataFrame studentScoresDF = sqlContext.read().json(
"hdfs://spark1:9000/spark-study/students.json");
//json數據源的格式要求:
Hive數據源
// 待續。。。
JDBC數據源:
http://www.cnblogs.com/key1309/p/5350179.html