PostreSQL取出每組第一條（最高）記錄（6種方法）

本文轉載自查看原文 2020-09-29 09:37 1207 Postgresql

Select first row in each GROUP BY group?

stackflow上面的一個問題。用窗口函數比較簡單，但是那些沒有窗口函數的數據庫怎么辦？

id | customer | total
---+----------+------
 1 | Joe      | 5
 2 | Sally    | 3
 3 | Joe      | 2
 4 | Sally    | 1

WITH summary AS (
SELECT p.id,
p.customer,
p.total,
ROW_NUMBER() OVER(PARTITION BY p.customer ORDER BY p.total DESC) AS ranks
FROM PURCHASES p)
SELECT s.*
FROM summary s
WHERE s.ranks = 1

所以給出通用方法：

SELECT MIN(x.id), -- change to MAX if you want the highest
x.customer,
x.total
FROM PURCHASES x
JOIN (SELECT p.customer,
MAX(total) AS max_total
FROM PURCHASES p
GROUP BY p.customer) y ON y.customer = x.customer
AND y.max_total = x.total
GROUP BY x.customer, x.total

PS：原博還提到了一種Postresql中特有的解法：DISTINCT ON ()

SELECT DISTINCT ON (customer)
id, customer, total
FROM purchases
ORDER BY customer, total DESC, id;

Or shorter (if not as clear) with ordinal numbers of output columns:

SELECT DISTINCT ON (2)
id, customer, total
FROM purchases
ORDER BY 2, 3 DESC, 1;

If total can be NULL (won't hurt either way, but you'll want to match existing indexes):

...
ORDER BY customer, total DESC NULLS LAST, id;

If total can be NULL, you most probably want the row with the greatest non-null value. Add NULLS LAST like demonstrated.

--如果total可以為空，則最可能希望具有最大非空值的行。最后添加空值。具體可參照：

PostgreSQL sort by datetime asc, null first?

其實有點不明白distinct on，看前輩的博客點擊打開鏈接。還用了IN 子查詢

　DISTINCT ON ( expression [, …] )把記錄根據[, …]的值進行分組，分組之后僅返回每一組的第一行。需要注意的是，如果你不指定ORDER BY子句，返回的第一條的不確定的。如果你使用了ORDER BY 子句，那么[, …]里面的值必須靠近ORDER BY子句的最左邊。

1. 當沒用指定ORDER BY子句的時候返回的記錄是不確定的。

postgres= # select distinct on(course)id,name,course,score from student;
id | name | course | score
----+--------+--------+-------
10 | 周星馳 | 化學 | 83
8 | 周星馳 | 外語 | 88
2 | 周潤發 | 數學 | 99
14 | 黎明 | 物理 | 90
6 | 周星馳 | 語文 | 91
( 5 rows)

2. 獲取每門課程的最高分

postgres= # select distinct on(course)id,name,course,score from student order by course,score desc;
id | name | course | score
----+--------+--------+-------
5 | 周潤發 | 化學 | 87
13 | 黎明 | 外語 | 95
2 | 周潤發 | 數學 | 99
14 | 黎明 | 物理 | 90
6 | 周星馳 | 語文 | 91
( 5 rows)

3. 如果指定ORDER BY 必須把分組的字段放在最左邊

postgres= # select distinct on(course)id,name,course,score from student order by score desc;
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: select distinct on(course)id,name,course,score from student ...

4. 獲取每門課程的最高分同樣可以使用IN子句來實現

postgres= # select * from student where(course,score) in(select course,max(score) from student group by course);
id | name | course | score
----+--------+--------+-------
2 | 周潤發 | 數學 | 99
5 | 周潤發 | 化學 | 87
6 | 周星馳 | 語文 | 91
13 | 黎明 | 外語 | 95
14 | 黎明 | 物理 | 90
(5 rows)

原文還提到在 row_number() over(), distinct on和in子句之間有一個小區別，主要是因為前兩個方法是用行號，且行號唯一。解決辦法就是用rank()窗口函數，讓同成績的行號出現重復。

下面是一位大神提供的6種方法，有些需要在PG中實現。而且這個大神的建表語句也讓我學習了。

Queries

1. `row_number()` in CTE, (see other answer) 公用表達式

WITH cte AS (
SELECT id, customer_id, total
, row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
FROM purchases
)
SELECT id, customer_id, total
FROM cte
WHERE rn = 1;

2. `row_number()` in subquery (my optimization) 子查詢

SELECT id, customer_id, total
FROM (
SELECT id, customer_id, total
, row_number() OVER(PARTITION BY customer_id ORDER BY total DESC) AS rn
FROM purchases
) sub
WHERE rn = 1;

3. `DISTINCT ON` (see other answer)

SELECT DISTINCT ON (customer_id)
id, customer_id, total
FROM purchases
ORDER BY customer_id, total DESC, id;

4. rCTE with `LATERAL` subquery (see here) 遞歸和LATERAL

第一個查詢取customer_id最小，且該id中total最大的。

在FROM 或者JOIN子句的子查詢里面可以關聯查詢FROM子句或者JOIN子句的另一邊的子句或者表.

見這一篇→點擊打開鏈接

WITH RECURSIVE cte AS (
( -- parentheses required
SELECT id, customer_id, total
FROM purchases
ORDER BY customer_id, total DESC
LIMIT 1
)
UNION ALL
SELECT u.*
FROM cte c
, LATERAL (
SELECT id, customer_id, total
FROM purchases
WHERE customer_id > c.customer_id -- lateral reference
ORDER BY customer_id, total DESC
LIMIT 1
) u
)
SELECT id, customer_id, total
FROM cte
ORDER BY customer_id;

5. `customer` table with `LATERAL` (see here)

SELECT l.*
FROM customer c
, LATERAL (
SELECT id, customer_id, total
FROM purchases
WHERE customer_id = c.customer_id -- lateral reference
ORDER BY total DESC
LIMIT 1
) l;

6. `array_agg()` with `ORDER BY` (see other answer)

SELECT (array_agg(id ORDER BY total DESC))[1] AS id
, customer_id
, max(total) AS total
FROM purchases
GROUP BY customer_id;

挺有意思的，數組中按照每組total降序，取第一個（MAX）。第一次見到分組字段不是放在第一個的。最后用MAX函數取出最大的值。但是這里出現了customer_id的自動升序，看了些帖子。

Results（性能）

Execution time for above queries with EXPLAIN ANALYZE (and all options off), best of 5 runs.

All queries used an Index Only Scan on purchases2_3c_idx (among other steps). Some of them just for the smaller size of the index, others more effectively.

A. Postgres 9.4 with 200k rows and ~ 20 per `customer_id`

1. 273.274 ms
2. 194.572 ms
3. 111.067 ms
4. 92.922 ms
5. 37.679 ms -- winner
6. 189.495 ms

B. The same with Postgres 9.5

1. 288.006 ms
2. 223.032 ms
3. 107.074 ms
4. 78.032 ms
5. 33.944 ms -- winner
6. 211.540 ms

C. Same as B., but with ~ 2.3 rows per `customer_id`

1. 381.573 ms
2. 311.976 ms
3. 124.074 ms -- winner
4. 710.631 ms
5. 311.976 ms
6. 421.679 ms

參考資料：https://www.oschina.net/translate/postgresqls-powerful-new-join-type-lateral

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 取出分組后每組的第一條記錄（不用group by）按時間排序 SQL查詢每組第一條數據記錄 sql server分組排序並取出每組中的第一條數據 LINQ分組排序后獲取每組第一條記錄 Linq分組后，再對分組后的每組進行內部排序，獲取每組中的第一條記錄 oracle分組取每組第一條數據 [LINQ].NET/C#應用程序中使用LINQ分組排序后獲取每組第一條記錄的實現方法有哪些？ LINQ分組取出第一條數據 Mysql取分組后的每組第一條數據 MSSQL 分組后取每組第一條（group by order by）

PostreSQL取出每組第一條（最高）記錄（6種方法 ）

Select first row in each GROUP BY group?

PS：原博還提到了一種Postresql中特有的解法：DISTINCT ON ()

Queries

1. row_number() in CTE, (see other answer) 公用表達式

2. row_number() in subquery (my optimization) 子查詢

3. DISTINCT ON (see other answer)

4. rCTE with LATERAL subquery (see here) 遞歸和LATERAL

5. customer table with LATERAL (see here)

6. array_agg() with ORDER BY (see other answer)