只用於個人的學習和總結。
178. Rank Scores
一、表信息
二、題目信息
對上表中的成績由高到低排序,並列出排名。當兩個人獲得相同分數時,取並列名次,且名詞中無斷檔。
Write a SQL query to rank scores. If there is a tie between two scores, both should have the same ranking. Note that after a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no "holes" between ranks.
For example, given the above Scores table, your query should generate the following report (order by highest score):
三、參考SQL
(1)方法一:直接表內連接
1 select s1.Score,count(distinct s2.Score) as 'Rank' 2 from Scores s1 3 inner join Scores s2 4 on s1.Score<=s2.Score 5 group by s1.id 6 order by count(distinct s2.Score);
分組字段和查詢字段不一致,可以在嵌套一層select。
解題思路:
1、欲得到排名,肯定用count進行統計,一個表肯定不行;
2、連接條件:得到大於或等於某個數的集合,比如大於等於3.50的集合就是{3.50,3.65,4.00,3.85,4.00,3.65}
3、分組:得到大於或等於某個數的6個集合組
4、去重統計:因為是排名無斷檔,需要進行去重再統計,不然就變成統計集合的個數(即大於等於某個值的個數),而不是該值在集合中排名
(2)方法二:窗口函數——dense_rank()(MySQL8.0)
1 SELECT Score, 2 DENSE_RANK() OVER(ORDER BY Score DESC) AS 'Rank' 3 FROM Scores;
窗口函數復習:https://zhuanlan.zhihu.com/p/135119865
180. Consecutive Numbers
一、表信息
二、題目信息
找出連續出現三次及以上的數字。例如,上表中,應該返回數字 1。
Write a SQL query to find all numbers that appear at least three times consecutively. For example, given the above Logs table, 1 is the only number that appears consecutively for at least three times.
三、參考SQL
方法一:多次連接
select distinct a.num as ConsecutiveNums from Logs a inner join Logs b on a.id=b.id+1 inner join Logs c on a.id=c.id+2 where a.num=b.num and a.num=c.num;
思路總結:
1.連續三次出現,意味着ID連續、值相等。
2.多次連接時,讓當前記錄、下條記錄、下下條記錄拼接在一起
3.篩選值相等的行記錄,有可能連續出現大於3次,去重即可得到該num。
方法二:窗口函數——行向下偏移lead()
select distinct Num as ConsecutiveNums from (select Num, lead(num,1) over(order by id) as next_num, lead(num,2) over(order by id) as next_next_num from Logs) t where t.Num=t.next_num and t.Num=t.next_next_num;
窗口函數lead():https://www.begtut.com/mysql/mysql-lead-function.html
181. Employees Earning More Than Their Managers[e]
一、表信息
如下 Employee 表中包含全部的員工以及其對應的經理。
The Employee table holds all employees including their managers. Every employee has an Id, and there is also a column for the manager Id.
二、題目信息
基於如上 Employee 表,查出薪水比其經理薪水高的員工姓名。
Given the Employee table, write a SQL query that finds out employees who earn more than their managers. For the above table, Joe is the only employee who earns more than his manager.
三、參考SQL
自連接:
1 select e1.Name as Employee 2 from Employee e1 3 inner join Employee e2 4 on e1.ManagerId=e2.Id 5 where e1.Salary>e2.Salary;
182. Duplicate Emails[e]
一、表信息
二、題目信息
查詢重復的郵箱
Write a SQL query to find all duplicate emails in a table named Person. For example, your query should return the following for the above table:
三、參考SQL
方法一:自己寫的
select Email from Person group by Email having count(*)>1;
方法二:官方答案
1 SELECT Email FROM 2 (SELECT Email, COUNT(id) AS num 3 FROM Person 4 GROUP BY Email) AS tmp 5 WHERE num > 1;
183. Customers Who Never Order[e]
一、表信息
假設一個網站上包含如下兩張表:顧客表和訂單表
Suppose that a website contains two tables, the Customers table and the Orders table.
表一:Customers
表二:Orders
二、題目信息
找出沒有下過訂單的顧客姓名。
Write a SQL query to find all customers who never order anything. Using the above tables as an example, return the following:
三、參考SQL
方法一:左外連接
1 select Name as Customers 2 from Customers c 3 left join Orders o 4 on c.Id=o.CustomerId 5 where o.Id is null;
方法二:子查詢(官方方法)
select Name as Customers from Customers where Id not in( select CustomerId from Orders );
184. Department Highest Salary[M]
一、表信息
表一:Employee
表二:Department
二、題目信息
查詢每個部門中,薪水最高的員工姓名及其薪水。
Write a SQL query to find employees who have the highest salary in each of the departments. For the above tables, your SQL query should return the following rows (order of rows does not matter).
三、參考SQL
方法一:窗口函數——dense_rank()
select d.Name as Department,t.Name as Employee,t.Salary from ( select *, dense_rank() over(partition by DepartmentId order by Salary DESC) as ranking from Employee ) t inner join Department d on t.DepartmentId=d.Id where ranking=1;
同一層select下,字段別名不用能與條件篩選!!!執行順序問題from——.....——where——.....——select——.....
思路:
1.用dense_rank()不斷檔的方式,給各個部門分組的工資大小排名
2.取排名為1的都是最大工資
方法二:關聯子查詢
1 select 2 d.Name as Department, 3 t.Name as Employee, 4 t.Salary 5 from Department d 6 inner join 7 (select Name,DepartmentId,Salary 8 from Employee e 9 where (e.DepartmentId,Salary) in 10 (select DepartmentId,max(Salary) 11 from Employee 12 group by DepartmentId) 13 ) t 14 on d.Id=t.DepartmentId;
in可以進行多屬性值(column1_name, column2_name,....)進行篩選,一一對應所篩選的字段。
思路:
1.找出部門中最大的工資
2.讓原始表中各部門的工資等於最大工資,羅列出所有最大工資。
3.內連接查詢相關信息
185. Department Top Three Salaries[h]
經典topN問題:記錄每組最大的N條記錄,既要分組,又要排序。
一、表信息
表一:Employee
表二:Department
二、題目信息
查詢各部門薪水排名前三名的員工姓名及薪水。
Write a SQL query to find employees who earn the top three salaries in each of the department. For the above tables, your SQL query should return the following rows (order of rows does not matter).
三、參考SQL
方法一:窗口函數——dense_rank()
1 select 2 d.name as Department, 3 e.name as Employee, 4 Salary 5 from Department d 6 inner join 7 ( 8 select name,Salary,DepartmentId, 9 dense_rank() over(partition by DepartmentId order by Salary desc) as ranking 10 from Employee 11 ) e 12 on d.Id=e.DepartmentId 13 where ranking<=3;
思路:
1.用dense_rank(),按照部門分組並降序排列,不間斷編上排名
2.篩選排名小於等於3的記錄,就是前三工資的記錄。
(ps:用窗口函數做,思路和上題差不多,區別只是后面篩選的條數)
方法二:自連接分組篩選
1 select 2 d.name as Department, 3 e.name as Employee, 4 Salary 5 from Department d 6 inner join 7 ( 8 select e1.Id,e1.DepartmentId,e1.name,e1.Salary 9 from Employee e1 10 inner join Employee e2 11 on e1.DepartmentId=e2.DepartmentId 12 and e1.Salary<=e2.Salary 13 group by e1.Id 14 having count(distinct e2.Salary)<=3 15 ) e 16 on d.Id=e.DepartmentId;
思路:
1.關鍵是要找出各部門前三工資的記錄:
自連接,連接條件為部門相等,工資比我大或者相等;
按員工分組,則組記錄為比我大或者相等全部員工記錄;
統計組記錄條數,少於等於3條,則表示我一定是工資第三的,這里有一點注意,不能用count(*),因為和我工資相等的員工除了我本身,還有可能有其他員工,如果不去重,就會導致記錄條數大於3(假設我剛好是第三),從而篩選掉,這不是想要的結果;
2.再按需求做相關查詢即可
(ps這題的自連接條件思路和178題差不多)
196. Delete Duplicate Emails[E]
一、表信息
二、題目信息
刪除郵件重復的行,當有重復值時,保留Id最小的行。
Write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique emails based on its smallest Id. For example, after running your query, the above Person table should have the following rows:
三、參考SQL
方法一:子查詢
1 delete from Person 2 where Id not in( 3 select Id from 4 ( 5 select min(Id) AS Id 6 from Person 7 group by Email 8 ) t 9 );
思路:
1.子查詢找出不用刪除的郵箱Id集合(重復郵箱的最小Id加上郵箱不重復的Id):郵箱分組,取最小Id即可。
2.刪除時,判斷Id不在此集合即可
(ps:MySQL不讓同時對統一表進行修改和查詢操作,所以需要外層嵌套一層輔助表;min(Id)要記得起別名)
方法二:自連接
1 delete p1 from Person p1 2 inner join Person p2 3 on p1.Email=p2.Email 4 and p1.Id>p2.Id;
思路:
1.郵箱相等進行連接得到的集合為:(1)郵箱相等,Id也相等。即不重復的(2)郵箱相等,p1.Id>p2.Id(3)郵箱相等,p1.Id<p2.Id
2.把郵箱相等,p1.Id>p2.Id提取出來,刪除端即可。這樣就保留了小和不重復的。
(ps:聯級刪除也有這種語法——delete 表名 from .....)
197. Rising Temperature[E]
一、表信息
二、題目信息
以下圖為例,找出比前一天溫度高的id。
Write an SQL query to find all dates' id with higher temperature compared to its previous dates (yesterday).
Return the result table in any order.
The query result format is in the following example:
三、參考SQL
1 select w1.id as Id from Weather w1 2 inner join Weather w2 3 on datediff(w1.recordDate,w2.recordDate)=1 4 where w1.temperature>w2.temperature;
思路:
1.自內連的笛卡爾積中,去取出間隔相差一天的記錄,用datadiff()函數。
2.再篩選出溫度比上一天高ID即可。
(ps;日期不能進行簡單的相加相減,最好使用日期函數。https://www.w3school.com.cn/sql/sql_dates.asp)
262. Trips and Users[H]
一、表信息
表一:Trips
該表包含全部出租車信息的記錄。每一條記錄有一個 Id,ClientId 和 Drive_Id 都是與 Users 表聯結的外鍵。Status 包含 completed, cancelled_by_driver, 和 cancelled_by_client 三種狀態。
The Trips table holds all taxi trips. Each trip has a unique Id, while Client_Id and Driver_Id are both foreign keys to the Users_Id at the Users table. Status is an ENUM type of (‘completed’, ‘cancelled_by_driver’, ‘cancelled_by_client’).
表二:Users
該表包含全部的用戶信息。每一個用戶都有一個 Id,Role有三種狀態:client, driver 以及 partner。
The Users table holds all users. Each user has an unique Users_Id, and Role is an ENUM type of (‘client’, ‘driver’, ‘partner’).
二、題目信息
找出2013年10月1日至2013年10月3日期間,每一天 未被禁止的 (unbanned) 用戶的訂單取消率。
Write a SQL query to find the cancellation rate of requests made by unbanned users (both client and driver must be unbanned) between Oct 1, 2013 and Oct 3, 2013. The cancellation rate is computed by dividing the number of canceled (by client or driver) requests made by unbanned users by the total number of requests made by unbanned users.
取消率的計算方式如下:(被司機或乘客取消的非禁止用戶生成的訂單數量) / (非禁止用戶生成的訂單總數)
For the above tables, your SQL query should return the following rows with the cancellation rate being rounded to two decimal places.
三、參考SQL
方法一:子查詢篩選出有效訂單記錄
SELECT Request_at AS 'Day', round( count( CASE t_Status WHEN 'completed' THEN NULL ELSE 1 END ) / count( * ), 2 ) AS 'Cancellation Rate' FROM ( SELECT Request_at, Status as t_Status FROM Trips WHERE Client_Id NOT IN ( SELECT Users_Id FROM Users WHERE Banned = 'Yes' ) AND Driver_Id NOT IN ( SELECT Users_Id FROM Users WHERE Banned = 'Yes' ) ) t WHERE Request_at BETWEEN '2013-10-01' AND '2013-10-03' GROUP BY Request_at
思路:
1.重要一點是篩選出有效訂單記錄集合:顧客和司機都是未被禁止的!
子查詢:
Client_Id NOT IN ( SELECT Users_Id FROM Users WHERE Banned ='Yes' )
AND Driver_Id NOT IN ( SELECT Users_Id FROM Users WHERE Banned = 'Yes')
2.在上一步基礎上,統計每天分組被取消的訂單,用case when語句:當訂單是complated完成狀態時,返回null,這樣count就不會計數。
方法二:連接查詢篩選出有效訂單記錄集合
計算訂單取消率還可以用avg(Status!='completed'):
https://leetcode-cn.com/problems/trips-and-users/solution/ci-ti-bu-nan-wei-fu-za-er-by-luanz/
511. Game Play Analysis I[E]
一、表信息
該Activity表記錄了游戲用戶的行為信息,主鍵為(player_id, event_date)的組合。每一行記錄每個游戲用戶登錄情況以及玩的游戲數(玩的游戲可能是0)。
(player_id, event_date) is the primary key of this table. This table shows the activity of players of some game. Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on some day using some device.
二、題目信息
查詢每個用戶首次登陸的日期
Write an SQL query that reports the first login date for each player.
The query result format is in the following example:
三、參考SQL
1 SELECT 2 player_id, 3 MIN( event_date ) AS first_login 4 FROM 5 Activity 6 GROUP BY 7 play_id 8 ORDER BY 9 play_id;
512. Game Play Analysis II[E]
一、表信息
同上題
二、題目信息
查詢每個用戶首次登陸的日期所使用的設備。
Write a SQL query that reports the device that is first logged in for each player.
三、參考SQL
方法一:內連接+子查詢
1 SELECT 2 a.player_id, 3 a.device_id 4 FROM 5 Activity AS a 6 INNER JOIN ( SELECT player_id, MIN( event_date ) AS first_login FROM Activity GROUP BY player_id ORDER BY player_id ) AS b 7 ON a.player_id = b.player_id AND a.event_date = b.first_login 8 ORDER BY 9 a.player_id;
思路:
1.通過子查詢查出表b:每個玩家最早登錄的日期
2.再進行內連接(或者where篩選都可以)。
方法二:窗口函數dense_rank()
1 SELECT 2 player_id, 3 device_id 4 FROM 5 ( SELECT player_id, device_id, RANK ( ) OVER ( PARTITION BY player_id ORDER BY event_date ) AS rnk FROM Activity ) AS tmp 6 WHERE 7 rnk = 1;
思路:
1.player_id分組,event_date升序,不間斷排名
2.取排名為1即為玩家最早登錄的信息記錄
534. Game Play Analysis III[M]——分組累加和
一、表信息
同上
二、題目信息
按照日期,查詢每個用戶到目前為止累積玩的游戲數。
Write an SQL query that reports for each player and date, how many games played so far by the player. That is, the total number of games played by the player until that date. Check the example for clarity.
三、參考SQL
方法一:窗口函數sum()
1 SELECT 2 player_id, 3 event_date, 4 SUM( games_played ) over ( PARTITION BY player_id ORDER BY event_date ) AS games_played_so_far 5 FROM 6 activity;
思路:
1.player_id分組,event_date升序
2.對分組后的games_played進行累計求和
方法二:內連接后分組統計
1 SELECT 2 a1.player_id, 3 a1.event_date, 4 SUM( a2.games_played ) AS games_played_so_far 5 FROM 6 activity a1 7 INNER JOIN activity a2 ON a1.event_date >= a2.event_date 8 AND a1.player_id = a2.player_id 9 GROUP BY 10 a1.player_id, 11 a1.event_date;
思路:
1.對於這種需要分組累計統計的(求和、計數也好),內連接的連接條件一般都是非等值連接,讓主表的某個字段的值對應連接從表的同樣字段的多個值
這樣對主表的該字段進行分組后,就可以對從表的某個字段進行統計操作。
2.涉及到需要分組兩次的話,還要注意連接條件要加多一個等值判斷,避免組內的字段的值連接到其他組的字段值
沒有進行組內的等值連接條件的限定,不同組的值亂連接匹配。導致最后分組統計結果不正確。
3.得到連接總表后,按要求進行統計即可。
550. Game Play Analysis IV[M]
一、表信息
同上題
二、題目信息
查詢首次登錄后第二天也登錄的用戶比例。
Write an SQL query that reports the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. In other words, you need to count the number of players that logged in for at least two consecutive days starting from their first login date, then divide that number by the total number of players.
The query result format is in the following example:
三、參考SQL
方法一:內連接(統計連續兩天登錄)
1 SELECT 2 ROUND( 3 COUNT( CASE datediff( a1.event_date, a2.event_date ) WHEN 1 THEN 1 ELSE NULL END ) / COUNT(DISTINCT a1.player_id),2) AS fraction 4 FROM 5 activity a1 6 INNER JOIN activity a2 7 ON a1.player_id = a2.player_id 8 AND a1.event_date >= a2.event_date;
思路:
1.內連接的條件和思路和上題一樣
2.統計連續兩天登入,只需要同一用戶,登錄日期相差一天即可。(注意:這里不是統計首次登錄第二天也登錄的記錄,而是只要連續兩天登錄的記錄,因為開始日期可能不是最小日期)
方法二:子查詢+外連接(統計統計首次登錄第二天也登錄)
SELECT ROUND( COUNT( DISTINCT t.player_id ) / COUNT( DISTINCT a1.player_id ), 2 ) AS fraction FROM activity a1 LEFT JOIN ( SELECT player_id, MIN( event_date ) AS first_login FROM activity GROUP BY player_id ) t ON a1.player_id = t.player_id AND DATEDIFF( a1.event_date, t.first_login ) = 1;
思路:
1.先用子查詢查出每個用戶登錄的最早時間first_login
2.左外連接:id相等,和最早登錄時間相差一天。得到的表為:第二天用戶也登錄的記錄會和first_login連接,第二天不登錄用戶irst_login則為null(不是相差一天)
3.統計第二天登錄的用戶:COUNT( DISTINCT t.player_id ),null值不統計;不能用COUNT( t.player_id is not null )
(PS:原則上 t.player_id記錄是唯一的,除非一個用戶第二天登錄會產生多條記錄,而不是記錄最后一次登錄)
方法三:窗口函數_FIRST_VALUE()
1 SELECT 2 ROUND(COUNT(DISTINCT t.player_id)/COUNT(DISTINCT a1.player_id),2) AS fraction 3 FROM activity a1 4 LEFT JOIN (SELECT player_id,first_value(event_date) over(partition by player_id ORDER BY event_date) AS first_login FROM activity) t 5 ON a1.player_id=t.player_id 6 AND DATEDIFF(a1.event_date,t.first_login)=1
569. Median Employee Salary[H]
一、表信息
下面的員工表包含全部的員工ID,公司名稱以及每個員工的薪水。
The Employee Table holds all employees. The employee table has three columns: Employee Id, Company Name, and Salary.
二、題目信息
找出各公司薪水的中位數。不用SQL內建函數。
Write a SQL query to find the median salary of each company. Bonus points if you can solve it without using any built-in SQL functions.
三、參考SQL
方法一:根據中位數最原始的定義
1 SELECT 2 e1.Id, 3 e1.company, 4 e1.salary 5 FROM 6 (SELECT Id,company,salary,@rnk:=IF(@pre=company, @rnk:=@rnk+1,1) AS rnk,@pre:=company 7 FROM employee,(SELECT @rnk:=0, @pre:=NULL) AS init 8 ORDER BY company,salary,Id 9 ) e1 10 INNER JOIN 11 (SELECT company,COUNT(*) AS cnt FROM employee GROUP BY Company 12 ) e2 13 ON e1.company=e2.company 14 WHERE e1.rnk IN (cnt/2+0.5,cnt/2,cnt/2+1);
思路:
中位數定義:奇數個數字時,中位數是中間的數字;偶數個數字時,中位數中間兩個數的均值(這里只列出兩個數,不求值)。即,數列總個數為N,則:
-
N為奇數,中位數排序編號是(N+1)/2=N/2+0.5
-
N為偶數,中位數排序編號是N/2和N/2+1
由於一個數列N總個數不是奇就是偶(互斥),所以(N/2+0.5)和(N/2、N/2+1)也是互斥,兩個元組的元素不可能同時為整數,也就是說無論數列總個數N是奇還偶,都可以直接這樣判斷:
中位數位置序號 IN (N/2+0.5,N/2,N/2+1)
基於上述可以得到大致的思路:
1.對薪水進行分組排序(不間斷連續),用自定義變量方法或者MySQL8.0的ROW_NUMBER()窗口函數
2.求總個數cnt,count(*)
3.篩選出中位數的位置序號e1.rnk IN (cnt/2+0.5,cnt/2,cnt/2+1)
其他方法:
https://www.cnblogs.com/qcyye/p/13451067.html
https://zhuanlan.zhihu.com/p/257081415
570. Managers with at Least 5 Direct Reports[M]
一、表信息
下面員工表中包含各部門員工信息及其對應的經理。
The Employee table holds all employees including their managers. Every employee has an Id, and there is also a column for the manager Id.
二、題目信息
查詢出至少管理5個員工的經理的名稱。
Given the Employee table, write a SQL query that finds out managers with at least 5 direct report. For the above table, your SQL query should return:
三、參考SQL
方法一:子查詢
1 SELECT NAME 2 FROM 3 employee 4 WHERE 5 Manager_id IN ( SELECT Manager_id FROM employee GROUP BY Manger_id HAVING COUNT( * ) >= 5 );
571. Find Median Given Frequency of Numbers[H]
一、表信息
下表記錄了每個數字及其出現的頻率。
The Numbers table keeps the value of number and its frequency.
二、題目信息
根據每個數字出現的頻率找出中位數。
Write a query to find the median of all numbers and name the result as median. In this table, the numbers are 0, 0, 0, 0, 0, 0, 0, 1, 2, 2, 2, 3, so the median is (0 + 0)/2 = 0.
三、參考SQL
參考答案:
https://www.e-learn.cn/topic/3843270
574. Winning Candidate[M]
一、表信息
表一:Candidate
該表中包含候選人的id和姓名。
表二:Vote
該表中id是自增列,CandidateId 對應 Candidate 表中的id。
二、題目信息
找到當選者的名字。注意:本題目中不考慮平票的情況,也就是說只有一個當選者。
Write a sql to find the name of the winning candidate, the above example will return the winner B.
Notes: You may assume there is no tie, in other words there will be only one winning candidate.
三、參考SQL
方法一:子查詢
1 SELECT NAME 2 FROM 3 candidate 4 WHERE 5 id = ( SELECT candidateid FROM vote GROUP BY candidateid ORDER BY COUNT( * ) DESC LIMIT 1 );
1.題目要求當選者只有一個,也就是獲票數最高只有一個
2.求分組最高:按candidateid分組,統計每個人票數,降序排列取第一行記錄
3.只有一個,讓‘id=’即可篩選出來。
(ps:多個票數相等的話,要用‘id in’,但是直接對limit字句的子查詢用‘id in’會報錯誤:This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery',需要再嵌套一層子查詢https://blog.csdn.net/sjzs5590/article/details/7337552)
方法二:內連接
1 SELECT 2 c.NAME 3 FROM 4 candidate c 5 INNER JOIN ( SELECT candidateid FROM vote GROUP BY candidateid ORDER BY COUNT( * ) DESC LIMIT 1 ) t 6 ON c.id = t.candidateid;
拓展:假若有多個票數相等的怎么辦?加多一個投票(6,5),即vote表變成;
則查詢結果為:
思路:
1.分組統計票數,然后用窗口函數dense_rank()對票數進行不間斷連續排名
2.排名為1的即是票數最高的
1 SELECT 2 NAME 3 FROM 4 candidate 5 WHERE 6 id IN 7 ( 8 SELECT 9 candidateid 10 FROM 11 ( 12 SELECT 13 candidateid, 14 dense_rank ( ) over ( ORDER BY poll DESC ) AS ranking 15 FROM 16 ( SELECT candidateid, COUNT( * ) AS poll FROM vote GROUP BY candidateid ) t1 17 ) t2 18 WHERE 19 ranking = 1 20 );
小結:到目前為止dense_rank()窗口函數已完成如下問題
- 直接對某字段排名,取topN——只有一個組。178題
- 先分組,再對分組后的某字段進行組內排名——取組內topN。185題
- 先分組,再對分組進行統計,再對統計后的結果進行排名——取組某個統計屬性(組的條數、組的某個字段和、平均等等)的topN。本題拓展
577. Employee Bonus[e]
一、表信息
表一:Employee
Employee表中,empId 是主鍵。
表二:Bonus
Bonus 表中 empId 是主鍵。
二、題目信息
選出獎金小於1000元的員工姓名及其獲得的獎金數。
Select all employee's name and bonus whose bonus is < 1000.
三、參考SQL
方法一:左連接
1 SELECT 2 e.NAME, 3 b.bonus 4 FROM 5 Employee e 6 LEFT JOIN Bonus b ON e.empId = b.empId 7 WHERE 8 b.bonus < 1000 9 OR b.bonus IS NULL;
思路:
1.獎金少於1000:包括沒獎金為null,和有獎金但<1000
2.Bonus 表只記錄有獎金的,自然想到左外連接
方法二:子查詢
1 SELECT 2 a.NAME, 3 b.bonus 4 FROM 5 Employee AS a 6 LEFT JOIN Bonus AS b ON a.empId = b.empId 7 WHERE 8 a.empId NOT IN ( SELECT empId FROM Bonus WHERE bonus >= 1000 );
思路:
獎金少於1000有兩種情況,但獎金大於等1000只有一種情況
578. Get Highest Answer Rate Question[M]
一、表信息
下面 survey_log 表中包含id, action, question_id, answer_id, qnum, timestamp。其中id代表用戶編號,action 為 "show", "answer", 以及 "skip" 中的一個值,當 action 值為 "answer" 時,answer_id 則不是空值,否則 answerid 為空值。q_num 是該問題出現的順序
二、題目信息
找到回答率最高的問題編號。其中,回答率 = (action 為 answer 的次數)/(action 為 show 的次數)。
Write a sql query to identify the question which has the highest answer rate.
Note:The highest answer rate meaning is: answer number's ratio in show number in the same question.
三、參考SQL
個人理解題目:大概類似於知乎的問題推送。系統或其他用戶廣播式的推送(show)某個問題給若干個同一畫像用戶群體,收到推送用戶可能只是瀏覽skip,或者回答answer。統計回答率最高的問題,可以知道該用戶群體喜歡什么樣類型的問題,下次推送時,可以為這類用戶群體多推送該類問題。最終達到問題被回答最大化,提高問題回答率。甚至可以給用戶做基礎分類或分類優化,如果用戶開始沒有主動選擇便簽的話。
方法一:分組+按需統計+按需排序后截取
1 SELECT question_id AS survey_log 2 FROM 3 (SELECT 4 question_id, 5 SUM(IF(action='answer',1,0)) as answer_num, 6 SUM(IF(action='show',1,0))as show_num 7 -- SUM(case when action="answer" THEN 1 ELSE 0 END) AS num_answer, 8 -- SUM(case when action="show" THEN 1 ELSE 0 END) AS num_show, 9 FROM survey_log 10 GROUP BY question_id) t 11 ORDER BY answer_num/show_num DESC 12 LIMIT 1;
思路:
1.在子查詢中,通過對問題進行分組,統計問題被show了多少次,被answer了多少次(不能對用戶分組,題目的表只是一部分,同一個問題可能會show多個用戶,但不是每個都會answer,研究的對象是某個問題)
2.外層查詢,常見的降序取最大值。
方法二:簡化版
1 SELECT question_id AS survey_log 2 FROM 3 survey_log 4 GROUP BY question_id 5 ORDER BY COUNT(answer_id)/COUNT(IF(action='show',1,NULL)) DESC 6 LIMIT 1
思路:answer_id只有在問題被回答了才有信息。count不會統計為NULL的值。
579. Find Cumulative Salary of an Employee[H]
一、表信息
下面的 Employee 表包含了員工一年中薪水的情況。
The Employee table holds the salary information in a year.
二、題目信息
查詢出每個員工三個月的累積工資,其中不包含最近一個月,且按照員工id升序排列,月份降序排列。
Write a SQL to get the cumulative sum of an employee's salary over a period of 3 months but exclude the most recent month.
The result should be displayed by 'Id' ascending, and then by 'Month' descending.
三、參考SQL
方法一:窗口函數sum()和dense_rank()
1 SELECT Id,MONTH,cum_sum AS Salary 2 FROM 3 ( 4 SELECT *,dense_rank ( ) over ( PARTITION BY id ORDER BY cum_sum DESC ) AS ranking 5 FROM 6 ( SELECT id, MONTH, SUM( Salary ) over ( PARTITION BY Id ORDER BY month ) AS cum_sum FROM employee_579 ) t1 7 ) t2 8 WHERE 9 ranking <> 1 10 ORDER BY Id,MONTH DESC;
思路:(分組——統計——排名——篩選)
1.子查詢t1:窗口聚合函數sum,按員工id分組,month降序進行工資累加,起別名為cum_sum
2.為了剔除最后一個月(最近一個月)的工資累加記錄,子查詢t2:用dense_rank()對cum_sum進行降序不間斷連續排名,則最后一個月記錄排名為1
3.加多一層查詢篩選掉最后一個月記錄的數據(ranking<>1),最后按需查詢顯示即可
(PS:假如最近一個月工資為0,那么用dense_rank()排名就會出現兩個1,所以最好用row_number()函數吧。沒試過,應該邏輯沒錯的!)
覺得太丑的話,可以用with as 語句進行美化:
1 WITH s AS 2 (SELECT Id, month, Salary, 3 Sum(Salary) OVER (PARTITION BY Id ORDER BY Month) as SumSal, 4 ROW_NUMBER() OVER (PARTITION BY id ORDER BY id ASC, month DESC) rn 5 FROM employee_579) 6 7 SELECT Id,Month,SumSal as Salary 8 FROM s 9 WHERE rn > 1;
方法二:官方答案
1 SELECT E1.id, E1.month, 2 (IFNULL(E1.salary, 0) + IFNULL(E2.salary, 0) + IFNULL(E3.salary, 0)) AS Salary 3 FROM 4 ( 5 SELECT id, MAX(month) AS month FROM Employee 6 GROUP BY id 7 HAVING COUNT(*) > 1 8 ) AS maxmonth 9 LEFT JOIN Employee AS E1 10 ON (maxmonth.id = E1.id AND maxmonth.month > E1.month) 11 LEFT JOIN Employee AS E2 12 ON (E2.id = E1.id AND E2.month = E1.month - 1) 13 LEFT JOIN Employee AS E3 14 ON (E3.id = E1.id AND E3.month = E1.month - 2) 15 ORDER BY id ASC, month DESC;
思路:
1.分組求最大月份(最近一個月),只有一個月的having掉,因為最近一個不統計
2.第一個left,連接條件的目的是想把最大月份給篩選掉
3.后面幾個left則是為了,為累加做准備,形成金字塔類型表結構:
E1.salary E2.salary E3.salary ..........
第一月 NULL NULL
第一月 第二月 NULL
第一月 第二月 第三個月
.........
然后,就可以這樣進行計算了(IFNULL(E1.salary, 0) + IFNULL(E2.salary, 0) + IFNULL(E3.salary, 0)) AS Salary,只想說,真秒!雖然我不會。。。。
580. Count Student Number in Departments[M]
一、表信息
表一:Student
表二:Department
二、題目信息
查詢每個部門下的學生數,要列出所有部門,即使該部門沒有學生。結果按學生數降序、部門名稱升序排列。
三、參考SQL
1 SELECT a.dept_name, COUNT(b.student_id) AS student_number FROM department AS a 2 LEFT JOIN student AS b 3 ON a.dept_id = b.dept_id 4 GROUP BY a.dept_name 5 ORDER BY student_number DESC, a.dept_name;
思路:左連接注意從表主表;分組時,注意分段字段和查詢字段一致
584. Find Customer Referee[E]
一、表信息
customer表中包含顧客編號、顧客名稱、以及推薦人編號。
Given a table customer holding customers information and the referee.
二、題目信息
找出不是被2號顧客推薦來的顧客姓名。
Write a query to return the list of customers NOT referred by the person with id '2'.
For the sample data above, the result is:
三、參考SQL
方法一:子查詢
1 SELECT name FROM customer WHERE 2 id NOT IN 3 (SELECT id FROM customer WHERE referee_id = 2);
(ps:子查詢如果有limit等字句,記得要多加一層查詢)
方法二:OR IS NULL
1 SELECT name FROM customer 2 WHERE referee_id <> 2 OR referee_id IS NULL;
(PS:由於 SQL 的三值邏輯,如果條件只是 WHERE referee_id <> 2,則返回不出 referee_id 為 null 的顧客。此外,如果將條件寫成 referee_id = NULL 同樣也是錯誤的,因為判斷空值必須使用 IS NULL/IS NOT NULL。)
方法三:IFNULL()函數
SELECT name FROM customer WHERE IFNULL(referee_id, 0) <> 2;
(PS:關於各種null函數——https://blog.csdn.net/pan_junbiao/article/details/85928004)
方法四:連環判斷NULL函數——coalesce()
1 SELECT name FROM customer 2 WHERE COALESCE(referee_id, 0) <> 2;
(PS:關於此函數——https://blog.csdn.net/weixin_38750084/article/details/83034294)
585. Investments in 2016[M]
一、表信息
二、題目信息
三、參考SQL
586. Customer Placing the Largest Number of Orders[E]
一、表信息
下圖中的訂單表orders包含了訂單號,顧客編號,下單日期,要求日期,發貨日期,狀態,以及評論。
二、題目信息
找出下單數最多的顧客,列出customer_number。注意:結果只有一個值,不會存在多個值,也就是默認最大值只有一個。
三、參考SQL
方法一:
1 SELECT 2 customer_number 3 FROM 4 orders 5 GROUP BY 6 customer_number 7 ORDER BY 8 COUNT( * ) DESC 9 LIMIT 1;
思路:
分組后,在order by 后使用聚合函數進行排序,這種用法可以節省很多SQL語句。
方法二:
1 SELECT customer_number FROM orders 2 GROUP BY customer_number 3 HAVING COUNT(customer_number) >= ALL 4 (SELECT COUNT(customer_number) FROM orders GROUP BY customer_number);
595. Big Countries[e]
一、表信息
World表中包含世界各國家、所屬洲、國土面積、人口數以及GDP信息。
二、題目信息
找到國土面積大於300萬平方公里或人口數超過2500萬的國家,並顯示器人口數和國土面積。
A country is big if it has an area of bigger than 3 million square km or a population of more than 25 million.
Write a SQL solution to output big countries' name, population and area.
For example, according to the above table, we should output:
三、參考SQL
1 SELECT name, population, area FROM World 2 WHERE area > 3000000 3 OR population > 25000000;
596. Classes More Than 5 Students[E]
一、表信息
courses表中包含學生 id 和 課程名稱。
二、題目信息
列出至少有5名學生的課程名稱。注意:每個學生只算作一次。
Please list out all classes which have more than or equal to 5 students.
Note: The students should not be counted duplicate in each course.
For example, the table:
三、參考SQL
方法一:
1 select class from courses 2 group by class 3 having count(distinct student)>=5;
思路:
直接在having用聚合函數進行篩選合適的組(課程)
方法二:子查詢
1 select class 2 from 3 ( 4 select count(distinct student) as num,class 5 from courses 6 group by class 7 ) t 8 where num>=5;
注意事項:
1.where不能直接用聚合函數進行篩選,需要起別名num
2.子查詢派生表要起別名t
597. Friend Requests I: Overall Acceptance Rate[E]
一、表信息
In social network like Facebook or Twitter, people send friend requests and accept others’ requests as well. Now given two tables as below:
表一:friend_request好友請求
表二:request_accepted申請請過
二、題目信息
找出申請通過率,結果保留兩位小數。
Write a query to find the overall acceptance rate of requests rounded to 2 decimals, which is the number of acceptance divides the number of requests.
注意:
- 接受的申請不單來源於 friend_request 表。因此,只需對兩張表分別進行計數,然后求出通過率即可。即通過率=接受請求的總數/請求數量
- 邀請人對同一個人可能不止發送過一次邀請;接受人也可以多次接受同一個邀請。因此要移除重復記錄。
- 如果完全沒有邀請記錄,則結果返回0.00。
針對如上的兩張表,結果應該返回0.80。
三、參考SQL
1 SELECT 2 ROUND( 3 IFNULL( 4 (SELECT COUNT(DISTINCT requester_id,accepter_id) FROM request_accepted597) 5 / 6 (SELECT COUNT(DISTINCT sender_id,send_to_id) FROM friend_request597) 7 ,0) 8 ,3) 9 AS accept_rate;
思路:
1、用兩個子查詢分別在兩個表中計算接受的請求總數、發出的請求總數;
2、求兩者比率即可
(PS:去重統計數量的時候,兩個字段相同才是真的重復,平時count函數里面一般就只加一個字段進行去重)
還可以用group by進行統計,看看那個字段組重復,比如,
1 SELECT requester_id,accepter_id,COUNT(*) FROM request_accepted597 GROUP BY requester_id,accepter_id;
601. Human Traffic of Stadium[H]
一、表信息
某城市建了一個新的體育館,每天有許多人來參觀。下表記錄了該自增編號、參觀日期以及參觀人數的信息。
X city built a new stadium, each day many people visit it and the stats are saved as these columns: id, visit_date, people
二、題目信息
找出連續至少三條記錄體育館參觀人數至少為100人的情況。
Please write a query to display the records which have 3 or more consecutive rows and the amount of people more than 100(inclusive).
注意:每一天只有一行記錄,且參觀日期隨編號列增加。
Note: Each day only have one row record, and the dates are increasing with id increasing.
三、參考SQL
方法一:
1 SELECT s1.* FROM 2 stadium601 s1 3 LEFT JOIN stadium601 s2 4 ON s1.id=s2.id-1 5 LEFT JOIN stadium601 s3 6 ON s2.id=s3.id-1 7 WHERE s1.people>=100 8 AND (s2.people>=100 OR s2.people IS NULL) 9 AND (s3.people>=100 OR s3.people IS NULL) 10 ORDER BY s1.id;
思路:
1、連續兩次左連接,連接條件為id-1,相當於與把表向上平移一個單位,把三天的記錄拼接到一行記錄,得到下面的表:
2、篩選:我們注意到,連接后得到的表中。s1的people字段不可能為null,若s3的people字段為null,說明該行記錄為倒數第二行,若s2的people字段為null,說明該行記錄為最后一行,這就是為什么用左連接的原因。所以,篩選的時候要注意s2.people和s3.people可以為null的情況。
3、查出s1.*記錄即可
(PS:這種連續出現的問題,連接條件應該考慮讓表向上平移一個1單位,類似題有180題。真實的業務場景大概就是找出滿足某個條件的某個時間段,比如用戶的活躍時間周期、商場火爆周期等等)
方法二:
1 WITH 2 tmp AS ( 3 SELECT a.visit_date AS date1, 4 b.visit_date AS date2, 5 c.visit_date AS date3 6 FROM stadium601 AS a 7 LEFT JOIN stadium601 AS b 8 ON b.id = a.id + 1 9 LEFT JOIN stadium601 AS c 10 ON c.id = a.id + 2 11 WHERE a.people >= 100 12 AND b.people >= 100 13 AND c.people >= 100 14 ), 15 16 tmp1 AS ( 17 SELECT date1 AS total_date FROM tmp 18 UNION 19 SELECT date2 AS total_date FROM tmp 20 UNION 21 SELECT date3 AS total_date FROM tmp 22 ) 23 24 SELECT * FROM stadium601 25 WHERE visit_date IN 26 (SELECT * FROM tmp1);
思路:
1、注意id-1和id+1的鏈接效果,一個往上平移,一個往下平移。不要想當然id+1是下移!!!
2、注意union的用法:
1 SELECT column_list 2 UNION [DISTINCT | ALL] 3 SELECT column_list 4 UNION [DISTINCT | ALL] 5 SELECT column_list
默認是distinct,結果集是去重的,而 all是不去重的!
602. Friend Requests II: Who Has the Most Friends[M]
一、表信息
卡卡卡卡卡卡,怎么博客園編輯器那么卡。大概是圖片太多了。
二、題目信息
三、參考SQL