leetcode-184-Department Highest Salary 優化記錄


題目

The Employee table holds all employees. Every employee has an Id, a salary, and there is also a column for the department Id.

+----+-------+--------+--------------+
| Id | Name  | Salary | DepartmentId |
+----+-------+--------+--------------+
| 1  | Joe   | 70000  | 1            |
| 2  | Henry | 80000  | 2            |
| 3  | Sam   | 60000  | 2            |
| 4  | Max   | 90000  | 1            |
+----+-------+--------+--------------+

The Department table holds all departments of the company.

+----+----------+
| Id | Name     |
+----+----------+
| 1  | IT       |
| 2  | Sales    |
+----+----------+

Write a SQL query to find employees who have the highest salary in each of the departments. For the above tables, Max has the highest salary in the IT department and Henry has the highest salary in the Sales department.

+------------+----------+--------+
| Department | Employee | Salary |
+------------+----------+--------+
| IT         | Max      | 90000  |
| Sales      | Henry    | 80000  |
+------------+----------+--------+
先后寫了5,6個版本,效率各不相同,挑出典型的5個,來分析一下sql語句的優化

1.Runtime: 1539 ms

select Department.Name as Department, 
    Employee.Name as Employee,
    Employee.Salary as Salary
from Department join Employee
   on Department.Id = Employee.DepartmentId
where (Department.Id, Employee.Salary) in
   (select DepartmentId, max(Salary) from Employee group by DepartmentId);

2.Runtime: 1204 ms

select Department.Name as Department, 
    Employee.Name as Employee,
    Employee.Salary as Salary
from Department join Employee
   on Department.Id = Employee.DepartmentId
where (Department.Id, Employee.Salary) in
   (select DepartmentId, Salary
    from (select * from Employee order by Salary desc) q
    group by DepartmentId);

3.Runtime: 1399 ms

select a.Name as Department, 
    b.Name as Employee,
    b.Salary as Salary
from Department a join Employee b
   on a.Id = b.DepartmentId
where exists(select 1 from (select * from Employee order by Salary desc) c
        group by DepartmentId
        having a.Id = c.DepartmentId and b.Salary = max(c.Salary));

4.Runtime: 980 ms

select a.Name as Department, 
    b.Name as Employee,
    b.Salary as Salary
from (Department a join Employee b on a.Id = b.DepartmentId) join
   (select c.DepartmentId,max(c.Salary) as Salary from (select * from Employee order by Salary desc) c group by DepartmentId) d
   on a.Id = d.DepartmentId and b.Salary = d.Salary;

5.Runtime: 957 ms

select a.Name as Department, 
    b.Name as Employee,
    b.Salary as Salary
from (Department a straight_join Employee b on a.Id = b.DepartmentId) straight_join
    (select c.DepartmentId,max(c.Salary) as Salary from (select * from Employee order by Salary desc) c group by c.DepartmentId) d
    on a.Id = d.DepartmentId and b.Salary = d.Salary;

 總結

  • 1與2比較,聚合函數 max() 的效率不如嵌套子查詢
  • 2與3比較, in 與 exists 效率差不多,當時在網上查的是:

 1、in 和 not in 也要慎用,否則會導致全表掃描

 2、很多時候用 exists 代替 in 是一個好的選擇

  不過通過后面的優化,可以看出 in 確實挺慢的

  • 3與4比較,4用 join on 代替了 where 判斷,效率提升很多,后來有個看過mysql源碼的大神說:

 在 MySQL 的 SELECT 查詢當中,其核心算法就是 JOIN 查詢算法。其他的查詢語句都相應向 JOIN 靠攏:單表查詢被當作 JOIN 的特例;子查詢被盡量轉換為 JOIN 查詢

  • 4與5比較,5將 join 替換為了 straight_join ,還是源碼大神說的:

對於多表查詢,如果可以確定表按照某一固定次序處理可以獲得較好的效率,則建議加上 STRAIGHT_JOIN 子句,以減少優化器對表進行重排序優化的過程。

該子句一方面可以用於優化器無法給出最優排列的 SQL 語句;另一方面同樣適用於優化器可以給出最優排列的 SQL 語句,因為 MySQL 算出最優排列也需要耗費較長的流程。

對於后一狀況,可以根據 EXPLAIN 的提示選定表的順序,並加上 STRAIGHT_JOIN 子句固定該順序。該狀況下的使用前提是幾個表之間的數據量比例會一直保持在某一順序,否則在各表數據此消彼長之后會適得其反。

  對於經常調用的 SQL 語句,這一方法效果較好;同時操作的表越多,效果越好。

后記

  至此,優化還沒完全結束,leetcode上該題最快是813ms,但是沒有分享代碼,最后貼兩個別人家的代碼:

  Join twice,890ms accepted

SELECT Name, Employee, Salary
FROM Department JOIN (SELECT Employee.Name AS Employee, Employee.Salary, Employee.DepartmentId
    FROM Employee JOIN (SELECT `DepartmentId`, MAX(`Salary`) AS Salary
        FROM `Employee`
        GROUP BY `DepartmentId`
        ) t1 ON t1.DepartmentId = Employee.DepartmentId
    AND t1.Salary = Employee.Salary
    ) t2 ON Department.Id = t2.DepartmentId

  Easy Solution. No joins. GROUP BY is enough. 916ms

select
d.Name, e.Name, e.Salary
from
Department d,
Employee e,
(select MAX(Salary) as Salary,  DepartmentId as DepartmentId from Employee GROUP BY DepartmentId) h
where
e.Salary = h.Salary and
e.DepartmentId = h.DepartmentId and
e.DepartmentId = d.Id;


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM