分組的SQL語句有2個:
group by 和分組聚合函數實現 partition by (oracle和postgreSQL中的語句)功能
group by + having 組合賽選數據
注意:having 條件的字段必須在前面查詢賽選存在,否則語法錯誤
錯誤格式:
SELECT MAX(ID),U_ID FROM mlzm_comments GROUP BY U_ID HAVING Data_Status >0
正確格式:
SELECT MAX(ID),U_ID,Data_Status FROM mlzm_comments GROUP BY U_ID HAVING Data_Status >0
group by強調的是一個整體,就是組,只能顯示一個組里滿足聚合函數的一條記錄, partition by 在整體后更強調個體,能顯示組里所有個體的記錄。
#實際需求,獲取滿足條件第一條信息或最后一條信息
步驟拆解:
#步驟一:找出所有符合第一條件條件的數據,默認排序是按主鍵索引升序排列,這里按u_id 字段排序方便審閱
SELECT a.ID,a.U_ID FROM mlzm_content a WHERE a.Data_Status = 2 ORDER BY a.U_ID,a.ID ASC;

#步驟2:利用group by 和max()、min()函數,對符合第一條件的數據進行分組,並獲取當前分組中最小和最大的,注意當前結果集中的id字段不一定是最小的,若想要最小或最大id,需要對表a先進行排序處理
#未對表a 進行排序 SELECT b.ID,b.U_ID,MIN(b.ID),MAX(b.ID) FROM ( SELECT a.ID,a.U_ID FROM mlzm_content a WHERE a.Data_Status = 2 ) AS b GROUP BY b.U_ID;
上面的語句等效於
#優化處理,但這樣的數據無法保證a.ID 排序的有效性 SELECT a.ID,a.U_ID,MIN(a.ID),MAX(a.ID) FROM mlzm_content a WHERE a.Data_Status = 2 GROUP BY a.U_ID;

在這可以明確看出,所得的結果集中,當前的id 並非最小的也非最大的(如第1條顯示的,當前id 為 31550 而最小的是 30768 最大的為38849),因此這種情況id字段不能作為后面的賽選條件
接下來用未排序的ID 字段作為依據查找的數據也並想要的
#以未排序的id字段作為參考依據,結果並非想要的 SELECT c.ID,c.U_ID,b.ID as b_id,b.U_ID as b_uid ,b.min_id,b.max_id FROM mlzm_content AS c INNER JOIN ( SELECT a.ID, a.U_ID, MIN(a.ID) as min_id, MAX(a.ID) as max_id FROM mlzm_content as a WHERE a.Data_Status=2 GROUP BY a.U_ID ORDER BY a.U_ID ) AS b ON c.ID =b.ID

這種情況想要得到最小/最大的的一條信息,需要以min_id /max_id作為參考依據
#從未排序結果集中獲取最小的一條信息 SELECT c.ID,c.U_ID,b.ID as b_id,b.U_ID as b_uid ,b.min_id,b.max_id FROM mlzm_content AS c INNER JOIN ( SELECT a.ID, a.U_ID, MIN(a.ID) as min_id, MAX(a.ID) as max_id FROM mlzm_content as a WHERE a.Data_Status=2 GROUP BY a.U_ID ORDER BY a.U_ID ) AS b ON c.ID =b.min_id #從未排序結果集中獲取最大的一條信息 SELECT c.ID,c.U_ID,b.ID as b_id,b.U_ID as b_uid ,b.min_id,b.max_id FROM mlzm_content AS c INNER JOIN ( SELECT a.ID, a.U_ID, MIN(a.ID) as min_id, MAX(a.ID) as max_id FROM mlzm_content as a WHERE a.Data_Status=2 GROUP BY a.U_ID ORDER BY a.U_ID ) AS b ON c.ID =b.max_id
最小的一條結果

耗時:0.0310 秒
最大的一條結果

先排序后再獲取數據
#對表a 進行先排序處理 SELECT b.ID,b.U_ID,MIN(b.ID),MAX(b.ID) FROM ( SELECT a.ID,a.U_ID FROM mlzm_content a WHERE a.Data_Status = 2 ORDER BY a.ID ASC ) AS b GROUP BY b.U_ID;

先進行排序后可以看出當前的id 和min(b.id)一致,現在可用id字段作為賽選條件
接下來用排序后得到的id 和min_id 來進行獲取分組中最小的一條信息
#用排序后得到的id作為判斷依據 SELECT d.ID, d.U_ID, c.ID AS c_id, c.U_ID AS c_uid, c.min_id, c.max_id FROM mlzm_content AS d INNER JOIN ( SELECT b.ID, b.U_ID, MIN(b.ID) AS min_id, MAX(b.ID) AS max_id FROM ( SELECT a.ID, a.U_ID FROM mlzm_content a WHERE a.Data_Status = 2 ORDER BY a.ID ASC ) AS b GROUP BY b.U_ID ) AS c ON d.ID = c.ID #用min_id 作為依據 SELECT d.ID, d.U_ID, c.ID AS c_id, c.U_ID AS c_uid, c.min_id, c.max_id FROM mlzm_content AS d INNER JOIN ( SELECT b.ID, b.U_ID, MIN(b.ID) AS min_id, MAX(b.ID) AS max_id FROM ( SELECT a.ID, a.U_ID FROM mlzm_content a WHERE a.Data_Status = 2 ORDER BY a.ID ASC ) AS b GROUP BY b.U_ID ) AS c ON d.ID = c.min_id
得到的結果集相同,因此先排序處理后id是可作為判斷依據的否則只能用min_id作為判斷依據

耗時:0.0410 秒
結論:先進行排序處理,關鍵字段可做判斷依據,否則需要用獲取的max(column_name)或min(column_name)作為判斷依據。
但先進行排序會比不排序耗費的時間多,因此排序直接使用max(column_name)或min(column_name)作為判斷依據處理要高效點。
錯誤的篩選方式:
#該方式判斷是錯誤的,因為 結果集b中並未進行條件賽選,因此的到的結果中的min_id 對應的數據可能並不滿足Data_Status=2的條件,最后做判斷的時候這些數據會被排除掉。 SELECT c.ID,c.U_ID,b.ID as b_id,b.U_ID as b_uid ,b.min_id,b.max_id FROM mlzm_content AS c INNER JOIN ( SELECT a.ID, a.U_ID, MIN(a.ID) as min_id, MAX(a.ID) as max_id FROM mlzm_content as a GROUP BY a.U_ID ) AS b ON c.ID =b.min_id WHERE c.Data_Status=2
這是錯誤的判斷方式,謹記啊
這樣得到的結果集要少於真實的數據集。
除了(INNER) JOIN 外也可通過 in 和 exists 來獲取
#用in 方式
SELECT c.ID,c.U_ID FROM mlzm_content AS c WHERE c.ID in ( SELECT MIN(a.ID) as min_id FROM mlzm_content as a WHERE a.Data_Status=2 GROUP BY a.U_ID ) ORDER BY U_ID
耗時:0.0410 秒
#用exists 方式獲取 SELECT c.ID,c.U_ID FROM mlzm_content AS c WHERE exists ( SELECT * from ( SELECT MIN(a.ID) as min_id FROM mlzm_content as a WHERE a.Data_Status=2 GROUP BY a.U_ID ) as b where b.min_id = c.ID ) ORDER BY U_ID
耗時:0.0520 秒
小結:
in 和 exists 執行效率收 子表大小的影響,子表小in的效率高,反之,若子表大則exists的效率高。
in和exists效率比 join 低,因為只有1個字段判斷,特別是在數據量大的時候差距更大,10萬數據+的話 join 和 in至少差距在5分鍾以上,因此最佳方式是使用INNER JOIN 連表查詢。
//=====================================================================================================//
獲取滿足條件的最小一條數據或最大一條數據,sql優化后:
最小:
SELECT c.* FROM mlzm_content AS c INNER JOIN ( SELECT MIN(a.id) as min_id FROM a WHERE a.status=2 (條件判斷) GROUP BY a.U_ID (分組依據) ) AS b ON c.id =b.min_id where (其他條件)
最大:
SELECT c.* FROM mlzm_content AS c INNER JOIN ( SELECT max(a.id) as max_id FROM a WHERE a.status=2 (條件判斷) GROUP BY a.U_ID (分組依據) ) AS b ON c.id =b.max_id where (其他條件)
//=====================================================================================================//
上面利用group by分組方式 只能獲取到最大或最小的,那么若是想要獲取到指定位置的條數呢?如,獲取滿足條件的第5 條信息。
mysql中是不存在聚合函數 partition by的 ,要想實現類似功能需要利用 group_concat + substr等函數處理
partition by 語法
select .... over( partition by column1 order by column2) from table_name ...
函數:
concat(str1, str2,...)
功能:將多個字符串連接成一個字符串
結果:返回的結果為參數相連接產生的字符串,注意,若其中任意參數為null,則返回的結果也為null
concat('str1','、 ','str2','、','str3') 對應的結果為:str1、str2、str3,但這樣每次都要填寫分隔符,看起來很是臃腫,那有沒有簡單的實現方式呢?
有那就是concat_ws(separator, str1, str2, ...)函數(concat_ws就是concat with separator),功能與concat類似將多個字符串連接成一個字符串,但是可以一次性指定分隔符。
上面的代碼可簡化為concat_ws('、','str1', 'str2', 'str3')
注意:concat()和concat_ws()一樣,只要傳入的參數有null 則返回結果均為null,分隔符為null也是一樣的。
group_concat(expr[表達式])聚合函數 ,
前言:在有group by的查詢語句中,select指定的字段要么就包含在group by語句的后面,作為分組的依據,要么就包含在聚合函數中。(有關group by的知識:淺析SQL中Group By的使用)。
group_concat()函數
1、功能:將group by產生的同一個分組中的值連接起來,返回一個字符串結果。
2、語法:group_concat( [distinct] 要連接的字段 [order by 排序字段 asc/desc ] [separator '分隔符'] )
說明:通過使用distinct可以排除重復值;如果希望對結果中的值進行排序,可以使用order by子句;separator是一個字符串值,缺省為一個逗號。
SELECT GROUP_CONCAT(a.ID)as ids,a.U_ID FROM mlzm_content AS a WHERE a.Data_Status = 2 GROUP BY a.U_ID ORDER BY a.U_ID

將id倒序連接
# 以'# #' 來分割連接 SELECT GROUP_CONCAT(DISTINCT a.ID ORDER BY a.id DESC separator '# #')as ids,a.U_ID FROM mlzm_content AS a WHERE a.Data_Status = 2 GROUP BY a.U_ID ORDER BY a.U_ID

接下來用substring_index(“待截取有用部分的字符串”,“截取數據依據的字符(分隔符)”,截取字符的位置N)函數來截取想要的數據
如:獲取滿足條件的前5 條信息
SELECT substring_index(GROUP_CONCAT(DISTINCT a.ID ORDER BY a.id ASC separator ','),',',5) as sub_id, a.U_ID FROM mlzm_content AS a WHERE a.Data_Status = 2 GROUP BY a.U_ID ORDER BY a.U_ID

錯誤的獲取(in):
SELECT c.id,c.U_ID FROM mlzm_content as c JOIN (SELECT substring_index(GROUP_CONCAT(DISTINCT a.ID ORDER BY a.id ASC separator ','),',',5) as sub_id, a.u_id FROM mlzm_content AS a WHERE a.Data_Status = 2 GROUP BY a.U_ID ) as t ON c.u_id = t.u_id WHERE c.id in (t.sub_id) ORDER BY c.u_id

可以看出in操作只匹配了第一個元素。若想要用in 操作的話可把子查詢的結果集提出來再當做參數傳入,如用PHP 中拆成2步來進行,先通過substring_index()函數把滿足條件的數據先篩選出來,再通過結果集去循環查詢匹配的數據,但是這樣效率低,占用資源多,(因此舍棄這種操作)
正確的獲取(find_in_set):
SELECT c.id,c.u_id FROM mlzm_content as c JOIN (SELECT substring_index(GROUP_CONCAT(DISTINCT a.ID ORDER BY a.id ASC separator ','),',',5) as sub_id, a.u_id FROM mlzm_content AS a WHERE a.Data_Status = 2 GROUP BY a.U_ID ) as t ON c.u_id = t.u_id WHERE FIND_IN_SET(c.id,t.sub_id) ORDER BY c.u_id

若是想得到前5條信息中最后一條信息
SELECT c.id,c.U_ID FROM mlzm_content as c JOIN (SELECT substring_index(substring_index(GROUP_CONCAT(DISTINCT a.ID ORDER BY a.id ASC separator ','),',',5),',',-1) as sub_id, a.U_ID FROM mlzm_content AS a WHERE a.Data_Status = 2 GROUP BY a.U_ID ) as t ON c.id = t.sub_id

利用這個方案, 以下類似業務需求都可以這么做, 如:
1. 查找每個用戶過去10個的登陸IP
2. 查找每個班級中總分最高的兩個人
greatest(value1,value2,...)函數,獲取傳入參數中最大的值
SELECT greatest(1,2,3,4)

SELECT greatest('a','b','c','bb','ae','d')

SELECT greatest('a','b','c','bb','ae','d'),ASCII('a'),ASCII('b'),ASCII('c'),ASCII('bb'),ASCII('ae'),ASCII('d')

從上面可以看出,greatest()函數在字符比較的時候,只取第一個字符進行ASCII 值比較
參考:https://blog.csdn.net/mary19920410/article/details/76545053/
in 與exists性能區分:https://www.cnblogs.com/beijingstruggle/p/5885137.html
mysql 函數參考:https://www.cnblogs.com/zwesy/p/9428509.html
