hive 數據清理--數據去重



hive> select * from (select *,row_number() over (partition by id) num from t_link) t where t.num=1;

  

保留crt_time最新的一個數據

select * from (select *,row_number() over (partition by id order by crt_time desc) num from t_link) t where t.num=1;

將查詢的去重數據保存到新表t_link2中,新表比源表t_link多一列

insert overwrite table t_link2 select * from (select *,row_number() over (partition by id order by crt_time desc) num from t_link) t where t.num=1;

  


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM