如何使用group by進行去重
因為mysql的distinct在結果集中,全部不同,才可以去重。
所以,當我們進行去重處理的時候,需要單獨對某列進行去重,可以使用group by子句進行分組去重
select _auto_id from account_login group by _auto_id; 該語句可以對_auto_id列進行去重。
在使用group by進行去重效率分析
無索引
0.23s
mysql> explain select _auto_id from account_login group by _auto_id;
+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+
| 1 | SIMPLE | account_login | ALL | NULL | NULL | NULL | NULL | 133257 | Using temporary; Using filesort |
+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+
mysql> show profile;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000154 |
| checking permissions | 0.000012 |
| Opening tables | 0.000029 |
| init | 0.000029 |
| System lock | 0.000014 |
| optimizing | 0.000010 |
| statistics | 0.000021 |
| preparing | 0.000020 |
| Creating tmp table | 0.000036 |
| Sorting result | 0.000007 |
| executing | 0.000005 |
| Sending data | 0.207841 |
| Creating sort index | 0.021024 |
| end | 0.000010 |
| removing tmp table | 0.000130 |
| end | 0.000010 |
| query end | 0.000016 |
| closing tables | 0.000019 |
| freeing items | 0.000035 |
| cleaning up | 0.000039 |
+----------------------+----------+
20 rows in set, 1 warning (0.00 sec)
此處創建了sort index進行排序,說明對MySQL使用了內存臨時表,group by后面的排序過程是使用sort index來完成的,而且該內存臨時表的大小是由MAX_HEAP_TABLE_SIZE來控制。
Sending data 顯示的這個時間 = Time(Sending data) + Time (Sorting result), 這樣其實應該是排序所用的時間
因為在group by后會進行自動排序,如果該我們僅僅想去重,而不需要排序,可以使用
mysql> explain select _auto_id from account_login group by _auto_id order by null;
+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+
| 1 | SIMPLE | account_login | ALL | NULL | NULL | NULL | NULL | 133257 | Using temporary |
+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000155 |
| checking permissions | 0.000012 |
| Opening tables | 0.000029 |
| init | 0.000029 |
| System lock | 0.000014 |
| optimizing | 0.000009 |
| statistics | 0.000022 |
| preparing | 0.000020 |
| Creating tmp table | 0.000042 |
| executing | 0.000006 |
| Sending data | 0.219640 |
| end | 0.000021 |
| removing tmp table | 0.000014 |
| end | 0.000008 |
| query end | 0.000014 |
| closing tables | 0.000020 |
| freeing items | 0.000033 |
| cleaning up | 0.000020 |
+----------------------+----------+
可以發現,在加入order by null子句后,MySQL並沒有創建sort index進行排序(內存排序非常快,優化效果並不明顯,並且這個階段只是每個數據塊的排序,)。但是在group by后添加多列,並且不能進行
有索引
mysql> explain select _auto_id from account_login group by _auto_id;
使用時間 0.11s
執行計划
+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------------+
| 1 | SIMPLE | account_login | index | idx_acc | idx_acc | 4 | NULL | 133257 | Using index |
+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------------+
profile
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000140 |
| checking permissions | 0.000011 |
| Opening tables | 0.000027 |
| init | 0.000028 |
| System lock | 0.000014 |
| optimizing | 0.000009 |
| statistics | 0.000035 |
| preparing | 0.000028 |
| Sorting result | 0.000006 |
| executing | 0.000005 |
| Sending data | 0.105595 |
| end | 0.000012 |
| query end | 0.000013 |
| closing tables | 0.000015 |
| freeing items | 0.000026 |
| cleaning up | 0.000034 |
+----------------------+----------+
explain select _auto_id from account_login group by _auto_id 時間0.11s
explain select _auto_id from account_login group by _auto_id order by null 時間0.11s
在使用索情況下,因為使用了索引自身的有序性,所以不需MySQL再次創建臨時表(create sort index)進行排序,可以直接輸出有序結果,兩者的計算時間相同。
正常使用場景效率分析
mysql> explain select _auto_id,max(date) from account_login group by _auto_id;
沒有索引
用時 3.16s
+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+
| 1 | SIMPLE | account_login | ALL | NULL | NULL | NULL | NULL | 133257 | Using temporary; Using filesort |
+----+-------------+---------------+------+---------------+------+---------+------+--------+---------------------------------+
mysql> show profile;
+---------------------------+----------+
| Status | Duration |
+---------------------------+----------+
| starting | 0.000111 |
| checking permissions | 0.000010 |
| Opening tables | 0.000018 |
| init | 0.000030 |
| System lock | 0.000011 |
| optimizing | 0.000007 |
| statistics | 0.000014 |
| preparing | 0.000013 |
| Creating tmp table | 0.000037 |
| Sorting result | 0.000007 |
| executing | 0.000005 |
| Sending data | 0.545211 |
| converting HEAP to MyISAM | 1.307225 |
| Sending data | 0.738511 |
| Creating sort index | 0.573640 |
| end | 0.000020 |
| removing tmp table | 0.001682 |
| end | 0.000009 |
| query end | 0.000012 |
| closing tables | 0.000016 |
| freeing items | 0.000030 |
| logging slow query | 0.000051 |
| cleaning up | 0.000018 |
+---------------------------+----------+
在group by過程中,先使用sort index對group by子句進行處理,然后創建臨時表,然后轉換到磁盤臨時表使用文件排序取出max(date)
如果group by后面列數過多(即使不排序),也是會用converting HEAP to MyISAM
converting HEAP to MyISAM 該語句表明了在執行過程中,內存臨時表轉變成了硬盤臨時表。可以使用 tmp_table_size,MAX_HEAP_TABLE_SIZE來改變內存臨時表的最大大小,但是在該SQL下,因為要使用文件排序,所以無論內存臨時表設置多大,都會進行內存臨時表到文件臨時表的轉變。
有索引情況
時間 0.31s
mysql> explain select _auto_id,max(date) from account_login group by _auto_id;
+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------+
| 1 | SIMPLE | account_login | index | idx_acc | idx_acc | 4 | NULL | 133257 | NULL |
+----+-------------+---------------+-------+---------------+---------+---------+------+--------+-------+
profile
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000109 |
| checking permissions | 0.000010 |
| Opening tables | 0.000022 |
| init | 0.000031 |
| System lock | 0.000012 |
| optimizing | 0.000007 |
| statistics | 0.000021 |
| preparing | 0.000022 |
| Sorting result | 0.000006 |
| executing | 0.000005 |
| Sending data | 0.314817 |
| end | 0.000024 |
| query end | 0.000015 |
| closing tables | 0.000032 |
| freeing items | 0.000042 |
| cleaning up | 0.000023 |
+----------------------+----------+
在有索引的情況下,僅僅靠索引本身就完成了全部需求。
distinct進行分析
explain select distinct(_auto_id) from account_login;
+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+
| 1 | SIMPLE | account_login | ALL | NULL | NULL | NULL | NULL | 133257 | Using temporary |
+----+-------------+---------------+------+---------------+------+---------+------+--------+-----------------+
mysql> show profile;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000087 |
| checking permissions | 0.000009 |
| Opening tables | 0.000016 |
| init | 0.000016 |
| System lock | 0.000011 |
| optimizing | 0.000007 |
| statistics | 0.000013 |
| preparing | 0.000014 |
| Creating tmp table | 0.000026 |
| executing | 0.000006 |
| Sending data | 0.221214 |
| end | 0.000024 |
| removing tmp table | 0.000190 |
| end | 0.000011 |
| query end | 0.000014 |
| closing tables | 0.000019 |
| freeing items | 0.000036 |
| cleaning up | 0.000024 |
+----------------------+----------+
select distinct _auto_id,sid,uid from account_login;
+---------------------------+----------+
| Status | Duration |
+---------------------------+----------+
| starting | 0.000095 |
| checking permissions | 0.000010 |
| Opening tables | 0.000019 |
| init | 0.000019 |
| System lock | 0.000010 |
| optimizing | 0.000006 |
| statistics | 0.000015 |
| preparing | 0.000016 |
| Creating tmp table | 0.000030 |
| executing | 0.000006 |
| Sending data | 0.529466 |
| converting HEAP to MyISAM | 1.928813 |
| Sending data | 0.157253 |
| end | 0.000020 |
| removing tmp table | 0.002778 |
| end | 0.000009 |
| query end | 0.000012 |
| closing tables | 0.000016 |
| freeing items | 0.000031 |
| logging slow query | 0.000062 |
| cleaning up | 0.000033 |
+---------------------------+----------+
發現distinct和沒有排序的group by幾乎是一樣的,並且在進行多列的去重的時候也使用了 converting HEAP to MyISAM進行匯總
總結:
create sort index 使用內存臨時表進行分塊排序,分塊排序后再進入磁盤進行匯總排序
converting HEAP to MyISAM 是進入硬盤進行匯總排序,如果group by數據列過多,即使不排序,也需要使用磁盤臨時表進行匯總數據。
group by的主要消耗是在臨時表排序階段,而不是分組階段。
所以制約group by性能的問題,就是臨時表+排序,盡量減少磁盤排序,較少磁盤臨時表的創建,是比較有用的處理辦法。
最好的辦法就是在group by條件后,添加索引或者復合索引,這樣MySQL就會利用索引完成排序,分組
原文:https://blog.csdn.net/u013983450/article/details/52190699