Spark2 Dataset之collect_set與collect_list


collect_set去除重復元素;collect_list不去除重復元素
select gender,
       concat_ws(',', collect_set(children)),
       concat_ws(',', collect_list(children))
  from Affairs
 group by gender

 

// 創建視圖 
data.createOrReplaceTempView("Affairs")

val df3= spark.sql("select gender,concat_ws(',',collect_set(children)),concat_ws(',',collect_list(children)) from Affairs group by gender")
df3: org.apache.spark.sql.DataFrame = [gender: string, concat_ws(,, collect_set(children)): string ... 1 more field]

df3.show  // collect_set去除重復元素;collect_list不去除重復元素
+------+-----------------------------------+------------------------------------+
|gender|concat_ws(,, collect_set(children))|concat_ws(,, collect_list(children))|
+------+-----------------------------------+------------------------------------+
|female|                             no,yes|                    no,yes,no,no,yes|
|  male|                             no,yes|                    no,yes,no,yes,no|
+------+-----------------------------------+------------------------------------+

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM