spark dataset join 使用方法java

本文轉載自查看原文 2018-09-03 16:03 2735 spark

 1 dataset<Row> df1,df2,df3
 2 
 3 //該方法可以執行成功
 4 df3= df1.join(df2,"post_id").selectExpr("hostname,request_date,post_id,title,author,name as category".split(","));  //innner join
 5 
 6 acc = df1.withColumnRenamed("post_id", "post_id_acc");
 7 //該方法join同名列的時候，要重命名，否則會報錯：重名列(通過drop刪除無效，不知道是什么原因)
 8 post_categories = acc.join(post_one_cat,acc.col("post_id_acc").equalTo(post_one_cat.col("post_id")),"left_outer").join(categories, post_one_cat.col("cate_id").equalTo(categories.col("id")),"left_outer").selectExpr("hostname,request_date,post_id_acc as post_id,title,author,name as category".split(","));
 9 //post_categories = acc.join(post_one_cat,acc.col("post_id_acc").equalTo(post_one_cat.col("post_id")),"left_outer").join(categories, post_one_cat.col("cate_id").equalTo(categories.col("id")),"left_outer").withColumnRenamed("name", "category")

.withColumnRenamed("post_id_cat", "post_id");

10 //該方法可以執行成功 

11 df3= df1.join(df2,JavaConverters.asScalaIteratorConverter(Arrays.asList("post_id").iterator()).asScala().toSeq(),"left_outer").join(cat, JavaConverters.asScalaIteratorConverter(Arrays.asList("cate_id").iterator()).asScala().toSeq(),"left_outer").selectExpr("hostname,request_date,post_id,title,author,name as category".split(","));

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Spark中使用Dataset的groupBy/agg/join/broadcast hasjoin/sql broadcast hashjoin示例(java api) java線程join方法使用方法簡介 python中join()函數的使用方法 python中join()函數的使用方法 spark streaming updateStateByKey 使用方法 sql語法：inner join on, left join on, right join on詳細使用方法 sql語法：inner join on, left join on, right join on詳細使用方法數據庫inner join使用方法 SQL Server中JOIN的使用方法總結 flask_sqlalchemy join的正確使用方法