原文:Spark2 Dataset之視圖與SQL

創建視圖 data.createOrReplaceTempView Affairs val df spark.sql SELECT FROM Affairs WHERE age BETWEEN AND df : org.apache.spark.sql.DataFrame affairs: double, gender: string ... more fields 子查詢 val df spa ...

2016-11-25 17:01 0 2006 推薦指數:

查看詳情

Spark2 Dataset聚合操作

data.groupBy("gender").agg(count($"age"),max($"age").as("maxAge"), avg($"age").as("avgAge")).show ...

Sat Nov 26 00:56:00 CST 2016 0 3666
Spark2 DataSet 創建新行之flatMap

val dfList = List(("Hadoop", "Java,SQL,Hive,HBase,MySQL"), ("Spark", "Scala,SQL,DataSet,MLlib,GraphX")) dfList: List[(String, String)] = List ...

Tue Nov 29 03:05:00 CST 2016 0 4532
Spark2 Dataset去重、差集、交集

import org.apache.spark.sql.functions._ // 對整個DataFrame的數據去重 data.distinct() data.dropDuplicates() // 對指定列的去重 val colArray=Array ...

Sat Nov 26 00:20:00 CST 2016 0 13165
Spark2 Dataset持久化存儲級別StorageLevel

import org.apache.spark.storage.StorageLevel // 數據持久緩存到內存中//data.cache()data.persist() // 設置緩存級別data.persist(StorageLevel.DISK_ONLY) // 清除緩存 ...

Fri Nov 25 23:40:00 CST 2016 0 6230
Spark2 Dataset多維度統計cube與rollup

val df6 = spark.sql("select gender,children,max(age),avg(age),count(age) from Affairs group by Cube(gender,children) order by 1,2") df6.show +------+--------+--------+--------+----------+ ...

Sat Nov 26 02:23:00 CST 2016 1 2709
Spark2 Dataset行列操作和執行計划

  Dataset是一個強類型的特定領域的對象,這種對象可以函數式或者關系操作並行地轉換。每個Dataset也有一個被稱為一個DataFrame的類型化視圖,這種DataFrame是Row類型的Dataset,即Dataset[Row]  Dataset是“懶惰”的,只在執行行動操作時觸發計算 ...

Fri Nov 25 22:21:00 CST 2016 0 15584
 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM