spark dataset join 使用方法java

本文转载自查看原文 2018-09-03 16:03 2735 spark

 1 dataset<Row> df1,df2,df3
 2 
 3 //该方法可以执行成功
 4 df3= df1.join(df2,"post_id").selectExpr("hostname,request_date,post_id,title,author,name as category".split(","));  //innner join
 5 
 6 acc = df1.withColumnRenamed("post_id", "post_id_acc");
 7 //该方法join同名列的时候，要重命名，否则会报错：重名列(通过drop删除无效，不知道是什么原因)
 8 post_categories = acc.join(post_one_cat,acc.col("post_id_acc").equalTo(post_one_cat.col("post_id")),"left_outer").join(categories, post_one_cat.col("cate_id").equalTo(categories.col("id")),"left_outer").selectExpr("hostname,request_date,post_id_acc as post_id,title,author,name as category".split(","));
 9 //post_categories = acc.join(post_one_cat,acc.col("post_id_acc").equalTo(post_one_cat.col("post_id")),"left_outer").join(categories, post_one_cat.col("cate_id").equalTo(categories.col("id")),"left_outer").withColumnRenamed("name", "category")

.withColumnRenamed("post_id_cat", "post_id");

10 //该方法可以执行成功 

11 df3= df1.join(df2,JavaConverters.asScalaIteratorConverter(Arrays.asList("post_id").iterator()).asScala().toSeq(),"left_outer").join(cat, JavaConverters.asScalaIteratorConverter(Arrays.asList("cate_id").iterator()).asScala().toSeq(),"left_outer").selectExpr("hostname,request_date,post_id,title,author,name as category".split(","));

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 Spark中使用Dataset的groupBy/agg/join/broadcast hasjoin/sql broadcast hashjoin示例(java api) java线程join方法使用方法简介 python中join()函数的使用方法 python中join()函数的使用方法 spark streaming updateStateByKey 使用方法 sql语法：inner join on, left join on, right join on详细使用方法 sql语法：inner join on, left join on, right join on详细使用方法数据库inner join使用方法 SQL Server中JOIN的使用方法总结 flask_sqlalchemy join的正确使用方法