相關內容簡體繁體

hive中order by、distribute by、sort by和cluster by的區別和聯系

本文轉載自查看原文 2019-07-05 20:15 3203 Hive

hive中order by、distribute by、sort by和cluster by的區別和聯系

order by

order by 會對數據進行全局排序,和oracle和mysql等數據庫中的order by 效果一樣，它只在一個reduce中進行所以數據量特別大的時候效率非常低。

而且當設置：set hive.mapred.mode=strict的時候不指定limit，執行select會報錯，如下：

LIMIT must also be specified。

sort by

sort by 是單獨在各自的reduce中進行排序，所以並不能保證全局有序，一般和distribute by 一起執行，而且distribute by 要寫在sort by前面。

如果mapred.reduce.tasks=1和order by效果一樣，如果大於1會分成幾個文件輸出每個文件會按照指定的字段排序，而不保證全局有序。

sort by 不受 hive.mapred.mode 是否為strict ,nostrict 的影響。

distribute by

DISTRIBUTE BY 控制map 中的輸出在 reducer 中是如何進行划分的。使用DISTRIBUTE BY 可以保證相同KEY的記錄被划分到一個Reduce 中。

cluster by

distribute by 和 sort by 合用就相當於cluster by，但是cluster by 不能指定排序為asc或 desc 的規則，只能是升序排列。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 hive中Sort By，Order By，Cluster By，Distribute By，Group By的區別 hive中order by ,sort by ,distribute by, cluster by 的區別（**很詳細**） Hive中order by sort by distribute by cluster by用法 hive中order by,sort by, distribute by, cluster by的用法 hive中order by,sort by, distribute by, cluster by作用以及用法 HiveQL之Sort by、Distribute by、Cluster by、Order By詳解 hive 中 Order by, Sort by ,Dristribute by,Cluster By 的作用和用法 [大數據相關] Hive中的全排序：order by,sort by, distribute by hive的高級查詢（group by、 order by、 join 、 distribute by、sort by、 clusrer by、 union all等） Hive的order by和sort by

粵ICP備18138465號 © 2018-2026 CODEPRJ.COM