原文:Spark2 DataSet 创建新行之flatMap

val dfList List Hadoop , Java,SQL,Hive,HBase,MySQL , Spark , Scala,SQL,DataSet,MLlib,GraphX dfList: List String, String List Hadoop,Java,SQL,Hive,HBase,MySQL , Spark,Scala,SQL,DataSet,MLlib,GraphX ca ...

2016-11-28 19:05 0 4532 推荐指数:

查看详情

Spark2 Dataset之视图与SQL

// 创建视图 data.createOrReplaceTempView("Affairs") val df1 = spark.sql("SELECT * FROM Affairs WHERE age BETWEEN 20 AND 25") df1 ...

Sat Nov 26 01:01:00 CST 2016 0 2006
Spark2 Dataset聚合操作

data.groupBy("gender").agg(count($"age"),max($"age").as("maxAge"), avg($"age").as("avgAge")).show ...

Sat Nov 26 00:56:00 CST 2016 0 3666
Spark2 Dataset去重、差集、交集

import org.apache.spark.sql.functions._ // 对整个DataFrame的数据去重 data.distinct() data.dropDuplicates() // 对指定列的去重 val colArray=Array ...

Sat Nov 26 00:20:00 CST 2016 0 13165
Spark2 Dataset持久化存储级别StorageLevel

import org.apache.spark.storage.StorageLevel // 数据持久缓存到内存中//data.cache()data.persist() // 设置缓存级别data.persist(StorageLevel.DISK_ONLY) // 清除缓存 ...

Fri Nov 25 23:40:00 CST 2016 0 6230
Spark2 Dataset多维度统计cube与rollup

val df6 = spark.sql("select gender,children,max(age),avg(age),count(age) from Affairs group by Cube(gender,children) order by 1,2") df6.show +------+--------+--------+--------+----------+ ...

Sat Nov 26 02:23:00 CST 2016 1 2709
Spark2 Dataset行列操作和执行计划

  Dataset是一个强类型的特定领域的对象,这种对象可以函数式或者关系操作并行地转换。每个Dataset也有一个被称为一个DataFrame的类型化视图,这种DataFrame是Row类型的Dataset,即Dataset[Row]  Dataset是“懒惰”的,只在执行行动操作时触发计算 ...

Fri Nov 25 22:21:00 CST 2016 0 15584
 
粤ICP备18138465号  © 2018-2025 CODEPRJ.COM