MySQL查詢每個店鋪銷售額前三的商品


1.前言

最近面試數據分析崗,兩家公司都問到了這個題目,一個是用SQL查詢每家店鋪銷售額前三的商品,一個是用Python統計每家店鋪銷售額前三的商品;而且在leetcode的數據庫題庫中,“部門工資前三高的所有員工”屬於同樣的類型,在所有題目中出現頻率排名第一,今天先進行SQL解題方法的復盤總結。

 

2. 題目

sales表包含所有的訂單信息,每個訂單有對應的訂單id orderid,店鋪id shopid,商品id goodid,銷售數量 salenum,銷售單價 price,下單日期 orderdate;

shop表包含店鋪信息,店鋪id shopid,店鋪名稱 shopname;

 

goods表包含商品信息,商品id goodsid,商品名稱 goodsname;

1)基礎版題目:編寫一個SQL查詢,找出每個店鋪在2020年Q1銷售額(銷售數據*銷售數量)排名前三的商品。例如,根據上述給定的表,查詢結果應返回:

2)進階版題目:編寫一個SQL查詢,找出每個店鋪在距今三個月內銷售額(銷售數據*銷售數量)排名前三的商品,分列展示。例如,根據上述給定的表,查詢結果應返回:

 

 

附建表語句和插入數據語句:

# 創建表sales並插入數據
CREATE TABLE `sales` 
( 
`orderid` int NOT NULL AUTO_INCREMENT, 
`shopid` int NOT NULL, 
`goodsid` int NOT NULL, 
`salenum` int NOT NULL, 
`price` int NOT NULL, 
`orderdate` date NOT NULL, 
PRIMARY KEY(`orderid`) 
);
INSERT INTO `sales` 
(`shopid`, `goodsid`, `salenum`, `price`, `orderdate`)
VALUES
(1, 10001, 1, 90, '2020-01-15'),
(1, 10002, 1, 50, '2020-02-23'),
(2, 10004, 2, 120, '2020-01-18'),
(1, 10003, 3, 60, '2020-01-19'),
(2, 10002, 1, 50, '2020-02-23'),
(1, 10002, 1, 40, '2020-03-01'),
(1, 10004, 3, 20, '2020-02-14'),
(1, 10003, 1, 10, '2020-03-01'),
(2, 10002, 1, 50, '2020-02-02'),
(2, 10001, 1, 40, '2020-02-09');

# 創建表shop並插入數據
CREATE TABLE `shop`
(
`shopid` int NOT NULL,
`shopname` varchar(10) NOT NULL
);
INSERT INTO `shop` VALUES
(1, 'SexyBaby'),
(2, 'AngelCity');

# 創建表goods並插入數據
CREATE TABLE `goods`
(
`goodsid` int NOT NULL,
`goodsname` varchar(10) NOT NULL
);
INSERT INTO `goods` VALUES
(10001, 'dress'),
(10002, 'shirt'),
(10003, 'coat'),
(10004, 'blouse');

 

 

 

 

3.使用窗口函數解題

注:MySQL從8.0版本開始支持窗口函數。

      既然要分組統計每個店鋪、每個商品的數據,先回憶一下具有分組統計功能的group by 和 partition by的區別:group by具有匯總的功能,只保留參與分組的字段和聚合函數的結果; 而partition by 能夠保留全部數據,只對其中某些字段做分組統計,常與排序函數連用(注意將聚合函數用在partition后的結果集上時,聚合函數是逐條累積計算值的,具體可參考博客:https://www.cnblogs.com/hello-yz/p/9962356.html)。

基礎版題目解題思路:

1.使用where篩選2020年 Q1的訂單數據;

2.因為一個店鋪中的同一個商品可能會存在多條訂單記錄,所以使用groupby聚合得到每個店鋪中每個商品的銷售額sumprice;

3.通過使用row_number() over (partition by ……),對每個店鋪內的商品銷售額進行降序排序,得到每個店鋪內商品的銷售額排名sumprice_rank;

4.將查詢的結果與shop表和goods表join,得到shopname和goodsname,再在外層使用where sumprice_rank <= 3得到每個店鋪內銷售額排名前三的商品。

 

SELECT shop.shopname, goods.goodsname, a.sumprice, a.sumprice_rank FROM
(SELECT shopid, goodsid, SUM(salenum * price) AS sumprice, ROW_NUMBER() OVER (PARTITION BY shopid ORDER BY SUM(salenum * price) DESC) AS sumprice_rank FROM sales WHERE orderdate > '2020-01-01' AND orderdate < '2020-03-31' GROUP BY shopid, goodsid) a
LEFT JOIN shop ON a.shopid = shop.shopid LEFT JOIN goods ON a.goodsid = goods.goodsid
WHERE a.sumprice_rank <= 3 ORDER BY shopname, sumprice_rank;

 

進階版題目解題思路:

1. 在上一版的基礎上,日期篩選條件為近三個月

2. 行轉列操作,注意此處為字符型數據行轉列

(日期篩選近*天/月/年參考博客:https://blog.csdn.net/weixin_33739523/article/details/85820328

行轉列方法參考博客:https://www.cnblogs.com/hiwuchong/p/10080215.html

SELECT shopname,
       MAX(CASE WHEN sumprice_rank = 1 THEN t.goodsname ELSE '' END) AS goodsname1,
       MAX(CASE WHEN sumprice_rank = 2 THEN t.goodsname ELSE '' END) AS goodsname2,
       MAX(CASE WHEN sumprice_rank = 3 THEN t.goodsname ELSE '' END) AS goodsname3
FROM
(SELECT shop.shopname, goods.goodsname, a.sumprice, a.sumprice_rank FROM (SELECT shopid, goodsid, SUM(salenum * price) AS sumprice, ROW_NUMBER() OVER (PARTITION BY shopid ORDER BY SUM(salenum * price) DESC) AS sumprice_rank FROM sales WHERE DATE_SUB(CURDATE(), INTERVAL 3 MONTH) <= date(orderdate) GROUP BY shopid, goodsid) a LEFT JOIN shop ON a.shopid = shop.shopid LEFT JOIN goods ON a.goodsid = goods.goodsid WHERE a.sumprice_rank <= 3 ORDER BY shopname, sumprice_rank) t GROUP BY shopname;

 

 

4.使用基本語法解題

基礎版題目期待結果集的最后一列sumpricerank,如果不使用窗口函數的話,需要賦值變量,這里先不額外展開,重點梳理使用基本語法查詢分組中top值的方法。

基礎版題目解題思路:

 

1.同上使用窗口函數解題思路中的1和2,先做篩選和聚合得到2020年Q1每個店鋪中每個商品的銷售額sumprice, 在此表基礎上繼續

 

2.為找每個店鋪的銷售額前三的商品,用上一步得到的表做自連接,連接條件是

t1.sumprice < t2.sumprice AND t1.shopid = t2.shopid

然后對滿足條件的商品進行計數

COUNT(t2.goodsid) < 3

如果數量小於3,那這個商品即為店鋪內銷售額前三的商品;

3. 將內層查詢的結果與shop表和goods表join,得到shopname和goodsname,再進行外層查詢得到需要的字段。

SELECT shop.shopname, goods.goodsname, t1.sumprice
FROM
(SELECT shopid, goodsid, sum(salenum * price) AS sumprice FROM sales WHERE orderdate > '2020-01-01' and orderdate < '2020-03-31'GROUP BY shopid, goodsid) t1 LEFT JOIN shop ON t1.shopid = shop.shopid LEFT JOIN goods ON t1.goodsid = goods.goodsid
WHERE
(SELECT COUNT(t2.goodsid) FROM (SELECT shopid, goodsid, sum(salenum * price) AS sumprice FROM sales WHERE orderdate > '2020-01-01' and orderdate < '2020-03-31'GROUP BY shopid, goodsid) t2 WHERE t1.sumprice < t2.sumprice AND t1.shopid = t2.shopid) < 3
ORDER BY t1.shopid, t1.sumprice DESC;

 

本人數據分析,機器學習初學者一枚,如果任何疑問,歡迎評論區交流討論,期待與大家共同進步。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM