Hive:有表A與表B進行inner join,如果A分組內包含有數據,使用A,否則使用B分組下的數據


tommyduan_fingerlib 指紋庫 柵格小區級別數據
tommyduan_mr_grid_cell_result_all 統計 柵格小區級別數據
業務:
以tommyduan_mr_grid_cell_result_all為主,如果某個柵格(gridid,buildingid,floor)沒有小區的話,使用用指紋庫的柵格(gridid,buildingid,floor)下的小區;
否則,使用tommyduan_mr_grid_cell_result_all的柵格(gridid,buildingid,floor)下的小區填充。

數據示例:

--指紋庫
--gridid1,buildlingid1,floor1,cell1
--gridid1,buildlingid1,floor1,cell2
--gridid1,buildlingid1,floor1,cell3

--gridid2,buildlingid1,floor1,cell31
--gridid2,buildlingid1,floor1,cell298

--統計結果
--gridid1,buildlingid1,floor1,cell2222
--gridid1,buildlingid1,floor1,cell3333

--合並后的結果:
--gridid1,buildlingid1,floor1,cell2222
--gridid1,buildlingid1,floor1,cell3333
--gridid2,buildlingid1,floor1,cell31
--gridid2,buildlingid1,floor1,cell298

實現思路:

首先,統計出每個分組屬於那個表。

create table tommyduan_gridcell_group as
select t10.gridid,t10.buildingid,t10.floor,(case when isnull(t11.buildingid) then 'fingerlib' else 'mr_grid_cell' end) as datafrom
from (select gridid,buildingid,floor from tommyduan_fingerlib group by gridid,buildingid,floor) t10
left outer join
(select gridid,buildingid,floor    from tommyduan_mr_grid_cell_result_all group by gridid,buildingid,floor) t11
on t10.gridid=t11.gridid and t10.buildingid=t11.buildingid and t10.floor=t11.floor;

其次,根據數據分組所在的表去那個表關聯出數據。

select t10.gridid,t10.objectid,t10.longitude,t10.latitude,t10.gridx,t10.gridy,
    t10.floor,t10.avgrsrp,t10.total_num,t10.mr_weak_num,
    t10.avgrsrq,t10.avgsinrul,
    t10.sinrul_total_num,t10.sinrul_low_num,t10.buildingid
from tommyduan_fingerlib t10 
inner join (select * from tommyduan_gridcell_group where datafrom='fingerlib') t11 on t10.gridid=t11.gridid and t10.buildingid=t11.buildingid and t10.floor=t11.floor
union all
select t10.gridid,t10.objectid,t10.longitude,t10.latitude,t10.gridx,t10.gridy,
    t10.floor,t10.avgrsrp,t10.total_num,t10.mr_weak_num,
    t10.avgrsrq,t10.avgsinrul,
    t10.sinrul_total_num,t10.sinrul_low_num,t10.buildingid
from tommyduan_mr_grid_cell_result_all t10 
inner join (select * from tommyduan_gridcell_group where datafrom='mr_grid_cell') t11 on t10.gridid=t11.gridid and t10.buildingid=t11.buildingid and t10.floor=t11.floor

 需要注意事項:

1)如果inner join 關聯條件中包含了buildingid或者gridid或者floor有null的數據,雖然兩邊都是null的條件下,也是無法關聯出來的;

2)針對buildingid如果兩邊都是null,關聯時依然需要關聯出來的解決方案請參考:《Hive&SqlServerql:inner join on條件中如果兩邊都是空值的情況下,關聯結果中會把數據給過濾掉

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM