轉自:http://blog.csdn.net/wguangliang/article/details/50167283
要求:按照課程分組,查找每個課程最高的兩個成績。
數據文件如下:
第一列no為學號,第二列course為課程,第三列score為分數
[plain] view plain copy
1. mysql> select * from lesson;
2. +-------+---------+-------+
3. | no | course | score |
4. +-------+---------+-------+
5. | N0101 | Marth | 100 |
6. | N0102 | English | 12 |
7. | N0102 | Chinese | 55 |
8. | N0102 | History | 58 |
9. | N0102 | Marth | 25 |
10. | N0103 | English | 100 |
11. | N0103 | Chinese | 87 |
12. | N0103 | History | 88 |
13. | N0103 | Marth | 72 |
14. | N0104 | English | 20 |
15. | N0104 | Chinese | 60 |
16. | N0104 | History | 88 |
17. | N0104 | Marth | 56 |
18. | N0105 | English | 56 |
19. | N0105 | Chinese | 88 |
20. | N0105 | History | 88 |
21. | N0201 | English | 66 |
22. | N0201 | Chinese | 77 |
23. | N0201 | History | 80 |
24. | N0201 | Marth | 100 |
25. | N0202 | English | 35 |
26. | N0202 | Chinese | 56 |
27. | N0202 | History | 86 |
28. | N0202 | Marth | 99 |
29. | N0203 | English | 100 |
30. | N0203 | Chinese | 87 |
31. | N0203 | History | 88 |
32. | N0203 | Marth | 57 |
33. | N0204 | English | 98 |
34. | N0204 | Chinese | 100 |
35. | N0204 | History | 66 |
36. | N0204 | Marth | 71 |
37. | N0205 | English | 98 |
38. | N0205 | Chinese | 100 |
39. | N0205 | History | 66 |
40. | N0205 | Marth | 71 |
41. | N0301 | English | 66 |
42. | N0301 | Chinese | 89 |
43. | N0301 | History | 68 |
44. | N0301 | Marth | 83 |
45. | N0302 | English | 76 |
46. | N0302 | Chinese | 99 |
47. | N0302 | History | 80 |
48. | N0302 | Marth | 74 |
49. | N0303 | English | 100 |
50. | N0303 | Chinese | 100 |
51. | N0303 | History | 88 |
52. | N0303 | Marth | 57 |
53. | N0304 | English | 76 |
54. | N0304 | Chinese | 100 |
55. | N0304 | History | 66 |
56. | N0304 | Marth | 86 |
57. | N0305 | English | 98 |
58. | N0305 | Chinese | 100 |
59. | N0305 | History | 40 |
60. | N0305 | Marth | 59 |
61. | N0306 | English | 52 |
62. | N0306 | Chinese | 87 |
63. | N0306 | History | 72 |
64. | N0306 | Marth | 71 |
65. | N0101 | Chinese | 55 |
66. | N0101 | History | 84 |
67. | N0101 | English | 82 |
68. | N0101 | English | 82 |
69. +-------+---------+-------+
70. 64 rows in set
1. select a.course,a.score
2. from
3. (
4. select course,score,row_number() over(partition by course order by score desc) as n
5. from lesson
6. )a
7. where a.n<=2;
其中:
1. row_number() over(partition by course order by score desc)
意思是以課程分組,按成績遞減排序,並為每組中的數據打上行號的標記,從1開始。
這樣,再在外層套一層過濾行號小於等於2的即可:-D
查詢結果如下圖1所示:
圖1 Hive查詢結果
由於MySQL不支持row_number()over()等窗口函數
方法1.自查詢比較
1. select course,score
2. from lesson a
3. where 2 >
4. (
5. select count(1)
6. from lesson b
7. where a.score<b.score and a.course=b.course
8. )
9. order by a.course,a.score desc;
因為是查詢最高的兩個成績,所以是2>,如果查詢最高的前N個成績,改成 N>
該條sql語句的大概思路是:
從a表中拿出一條數據,與b表中所有與該條數據相同course的數據比較,統計出b表有多少相同課程的score比該條數據的score高;
如果b表中有0條比該條數據高,則該條數據是該門課程的最高分;
如果統計出有1條數據,則該條數據是該門課程分數的第二高;
但是,還存在一些問題:
比如,最高分存在多個,則會統計出多於2條的數據,如下圖2統計結果也有所反應:
圖2 mysql查詢結果
1. SET @row=0;
2. SET @groupid='';
3. select a.course,a.score
4. from
5. (
6. select no,course,score,case when @groupid=course then @row:=@row+1 else @row:=1 end rownum,@groupid:=course from lesson
7. order by course,score desc
8. )a
9. where a.rownum<=2;
其中:
@row用於統計行號,@groupid用於分組,記錄該組的名稱
1. select no,course,score,case when @groupid=course then @row:=@row+1 else @row:=1 end rownum,@groupid:=course from lesson
2. order by course,score desc
意思是:按照分組名course和需要的排序score遞增 進行排序,這樣,相同課程就會排在一起,且相同的課程之間按照成績排序。
取出一條數據,如果該條數據的course與@group相同,則意味着是相同課程之間的比較,那么@row自加1。
否則意味着該條數據是另一門課程的第一條數據,則@row=1
這樣每個課程就能夠按照成績排序並標記上行號
那么外層只需要過濾rownum<=2即可得到每門課的前2個最高分。
最后執行結果與hive一致,不再上圖片了。