參考:
https://stackoverflow.com/questions/27415706/postgresql-select-top-three-in-each-group
http://charlesnagy.info/it/postgresql/group-by-limit-per-group-in-postgresql
但實在看不懂這里面LATERAL的用法,而且語法和pg11似乎也有了區別。
這個里面的LATERAL用法倒是看懂了,把LATERAL當成foreach來用:
https://www.oschina.net/translate/postgresqls-powerful-new-join-type-lateral?cmp
簡單說,用LATERAL JOIN2個子查詢:
1 GROUP BY 得到聚合后的分組
2 再用這個數量縮水的 用WHERE inner.grp = outer.grp 去過濾沒縮水前的全部數據。排序得到查詢結果
最終外面再包1層select
——但如果inner是個view 等於里面要掃描N次 N等於1的group數量。
我的分組有點多,幾百個,實測結果:要5秒鍾,實在不能忍,性能太差了
最后,還是用https://stackoverflow.com/questions/27415706/postgresql-select-top-three-in-each-group
里面最直接的窗口函數法,1次FROM就搞定,400ms,就算湊合了
SELECT * FROM( SELECT * ,ROW_NUMBER() OVER (PARTITION BY grp ORDER BY value DESC) AS order_in_grp FROM table1 ) AS A WHERE order_in_grp < 2
子查詢里用窗口函數得到每個分組內的序號order_in_grp:按grp字段分組,組內按value降序的序號
外層只是用WHERE過濾出每個組內前1名。
對我這種新手來說,還是這樣簡單、直接、標准語法的方式更適合我。怎么用LATERAL才能效率高,暫時無暇顧及了。
起碼符合The Zen of Python的前幾條
Beautiful is better than ugly. Explicit is better than implicit. Simple is better than complex. Complex is better than complicated. Flat is better than nested. Sparse is better than dense. Readability counts. Special cases aren't special enough to break the rules. Although practicality beats purity.
心得
不是看見分組就一定要GROUP BY, 窗口函數這樣反而更靈活——不聚合,而是,在子查詢里給出組內排名order_in_grp字段,然后在外面做1次WHERE把每個分組的前N名過濾出來。