Hive count 多個度量指標,帶有 distinct ,注意點!!!
比如 select organid, ppi, count(id1) as num1, count(distinct id2) as num 2 from table group by organid, ppi這樣的 SQL 語句,在hive里面執行操作,會導致 num1 的 數值可能存在誤差!!!!
在生產環境中,不建議count 多個度量指標,帶有 distinct,這樣寫SQL X X X
比較好的實現 SQL 是 兩次 group by 實現
select t.organid,t.ppi, sum(t.num) as num1, count(t.id2) as num2 from ( select organid,ppi, id2, count(id1) as num from table group by organid,id2,ppi) t
group by t.organid,t.ppi