spark-shell 交互式編程


數據集:

Tom,DataBase,80

Tom,Algorithm,50

Tom,DataStructure,60

Jim,DataBase,90

Jim,Algorithm,60

Jim,DataStructure,80

……

請根據給定的實驗數據,在 spark-shell 中通過編程來計算以下內容:

(1) 該系總共有多少學生:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val par = lines.map(row=>row.split(",")(0))      
val distinct_par = par.distinct()  //去重操作 
distinct_par.count  //取得總數

答案為265人。

(2) 該系共開設來多少門課程:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val par = lines.map(row=>row.split(",")(1))  
val distinct_par = par.distinct()  
distinct_par.count

答案為8門。

(3) Tom 同學的總成績平均分是多少:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val pare = lines.filter(row=>row.split(",")(0)=="Tom") 
pare.foreach(println) 
//Tom,DataBase,26 
//Tom,Algorithm,12 
//Tom,OperatingSystem,16 
//Tom,Python,40 
//Tom,Software,60 
pare.map(row=>(row.split(",")(0),row.split(",")(2).toInt)).mapValues(x=>(x,1)).reduceByKey((x,y ) => (x._1+y._1,x._2 + y._2)).mapValues(x => (x._1 / x._2)).collect()
//res9: Array[(String, Int)] = Array((Tom,30)) 

Tom的平均分為30分。

(4) 求每名同學的選修的課程門數:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val pare = lines.map(row=>(row.split(",")(0),row.split(",")(1)))
pare.mapValues(x => (x,1)).reduceByKey((x,y) => (" ",x._2 + y._2)).mapValues(x => x._2).foreach(println)

答案為265行。

(5) 該系 DataBase 課程共有多少人選修:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val pare = lines.filter(row=>row.split(",")(1)=="DataBase") 
pare.count 
res1: Long = 126

答案為126人。

(6) 各門課程的平均分是多少:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val pare = lines.map(row=>(row.split(",")(1),row.split(",")(2).toInt)) 
pare.mapValues(x=>(x,1)).reduceByKey((x,y) => (x._1+y._1,x._2 + y._2)).mapValues(x => (x._1 / x._2)).collect()
res0: Array[(String, Int)] = Array((Python,57), (OperatingSystem,54), (CLanguage,50), (Software,50), (Algorithm,48), (DataStructure,47), (DataBase,50), (ComputerNetwork,51))

答案為: (CLanguage,50) (Python,57) (Software,50) (OperatingSystem,54) (Algorithm,48) (DataStructure,47) (DataBase,50) (ComputerNetwork,51)

(7)使用累加器計算共有多少人選了 DataBase 這門課:

val lines = sc.textFile("file:///usr/local/spark/sparksqldata/Data01.txt") 
val pare = lines.filter(row=>row.split(",")(1)=="DataBase").map(row=>(row.split(",")(1),1)) 
val accum = sc.longAccumulator("My Accumulator") 
pare.values.foreach(x => accum.add(x)) 
accum.value 
res19: Long = 126 

答案為126 人。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM