HIVE中的order by操作

本文轉載自查看原文 2017-05-03 13:01 6064 Hive

hive中常見的高級查詢包括：group by、Order by、join、distribute by、sort by、cluster by、Union all。今天我們來看看order by操作，Order by表示按照某些字段排序，語法如下：

[java] view plain copy

注意：

(1)：order by后面可以有多列進行排序，默認按字典排序。

(2)：order by為全局排序。

(3)：order by需要reduce操作，且只有一個reduce，無法配置(因為多個reduce無法完成全局排序)。

order by操作會受到如下屬性的制約：

[java] view plain copy

注：如果在strict模式下使用order by語句，那么必須要在語句中加上limit關鍵字，因為執行order by的時候只能啟動單個reduce，如果排序的結果集過大，那么執行時間會非常漫長。

下面我們通過一個示例來深入體會order by的用法：

數據庫有一個employees表，數據如下：

[java] view plain copy

hive> select * from employees;
OK
lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love
liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love
zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love

現在我要按第二列(salary)降序排列：

[java] view plain copy

hive> select * from employees order by salary desc;
//執行MapReduce的過程
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 2.62 sec HDFS Read: 415 HDFS Write: 245 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 620 msec
OK
zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love
liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love
lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love
Time taken: 20.484 seconds
hive>

此時的hive.mapred.mode屬性為：

[java] view plain copy

現在我們將它改為strict，然后再使用order by進行查詢：

[java] view plain copy

hive> set hive.mapred.mode=strict;
hive> select * from employees order by salary desc;
FAILED: Error in semantic analysis: 1:33 In strict mode, if ORDER BY is specified, LIMIT must also be specified. Error encountered near token 'salary'
hive>

注：在strict模式下查詢必須加上limit關鍵字。

[java] view plain copy

hive> select * from employees order by salary desc limit 3;
FAILED: Error in semantic analysis: No partition predicate found for Alias "employees" Table "employees"

注：另外還有一個要注意的是strict模式也會限制分區表的查詢，解決方案是必須指定分區

先來看看分區：

[java] view plain copy

在strict模式先使用order by查詢：

[java] view plain copy

hive> select * from employees where partition(date_time='2015-01-24',type='love') order by salary desc limit 3;
FAILED: Parse Error: line 1:30 cannot recognize input near 'partition' '(' 'date_time' in expression specification
hive
> select * from employees where date_time='2015-01-24' and type='love' order by salary desc limit 3;
//執行MapReduce程序
Total MapReduce CPU Time Spent: 3 seconds 510 msec
OK
zhang 19000.0 ["xiao","wen","tian"] {"k7":7.0,"k8":8.0} {"street":"dingnan","city":"ganzhou","num":103} 2015-01-24 love
liao 18000.0 ["liu","li","huang"] {"k4":2.0,"k5":3.0,"k6":6.0} {"street":"dingnan","city":"ganzhou","num":102} 2015-01-24 love
lavimer 15000.0 ["li","lu","wang"] {"k1":1.0,"k2":2.0,"k3":3.0} {"street":"dingnan","city":"ganzhou","num":101} 2015-01-24 love
Time taken: 19.861 seconds
hive>

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Hive中JOIN操作 Order by vs Sort by in Hive Hive(三)hive的高級操作 MySQL中的排序(ORDER BY) spring中Order注解 C# Linq Order By操作 hive中的子查詢改join操作（轉） sqoop導入數據到hive表中的相關操作 Oracle中group by 1，order by 1的理解 mysql order by操作性能問題