Mysql-Limit 優化

本文轉載自查看原文 2019-04-03 08:56 7276 Mysql/ limit/ mysql/ 查詢優化

limit 查詢導出優化

耗時本質

mysql大數據量使用limit分頁，隨着頁碼的增大，查詢效率越低下。

當一個表數據有幾百萬的數據的時候成了問題！

如 select * from table limit 0,10 這個沒有問題當 limit 200000,10 的時候數據讀取就很慢

原因本質： 1）limit語句的查詢時間與起始記錄（offset）的位置成正比 2）mysql的limit語句是很方便，但是對記錄很多:百萬，千萬級別的表並不適合直接使用。

例如： limit10000,20的意思掃描滿足條件的10020行，扔掉前面的10000行，返回最后的20行，問題就在這里。 LIMIT 2000000, 30 掃描了200萬+ 30行，怪不得慢的都堵死了，甚至會導致磁盤io 100%消耗。但是: limit 30 這樣的語句僅僅掃描30行。

優化手段

干掉或者利用 limit offset,size 中的offset

不是直接使用limit，而是首先獲取到offset的id然后直接使用limit size來獲取數據

對limit分頁問題的性能優化方法

利用表的覆蓋索引來加速分頁查詢

覆蓋索引:

就是select 的數據列只用從索引中就能獲得，不必讀取數據行。mysql 可以利用索引返回select列表中的字段，而不必根據索引再次讀取數據文件，換句話說：查詢列要被所創建的索引覆蓋

因為利用索引查找有優化算法，且數據就在查詢索引上面，不用再去找相關的數據地址了，這樣節省了很多時間。另外Mysql中也有相關的索引緩存，在並發高的時候利用緩存就效果更好了。在我們的例子中，我們知道id字段是主鍵，自然就包含了默認的主鍵索引。

這次我們之間查詢最后一頁的數據（利用覆蓋索引，只包含id列），如下：

#覆蓋索引只包含id列 的時間顯著優於 select * 不言而喻
select * from order_table where company_id = 1 and mark =0 order by id desc limit 200000 ,20;
select id from order_table where company_id = 1 and mark =0 order by id desc limit 200000 ,20;

那么如果我們也要查詢所有列，有兩種方法，一種是id>=的形式，另一種就是利用join，看下實際情況：

#兩者用的都是一個原理嘛，所以效果也差不多
SELECT * FROM xxx WHERE ID > =(select id from xxx limit 1000000, 1) limit 20;
SELECT * FROM xxx a JOIN (select id from xxx limit 1000000, 20) b ON a.ID = b.id;

環境准備

test_dev.order_table 300萬數據
test_begin.order_table 5000萬數據

環境差異：兩邊表結構->索引不一樣，會存再同樣查詢前20萬數據 test_begin 比 test_dev 快些

實戰1：數據量百萬級別

利用或使用 offset

#show profiles 分析性能
#臨時開啟
SET profiling =1;
#查詢時候以非緩存方式查詢驗證：select SQL_NO_CACHE ......
#20-40萬:12559073  60-80萬:12159073  160-180萬:11159073 260-280萬:10158757
#含 offset 查詢 ->平均耗時：9.958s 左右 
select SQL_NO_CACHE * from order_table where company_id = 1 and mark =0 order by id desc limit 200000 ,200000;
#分開查詢 先查詢最大id 在執行 id<=max 的效率性能與合在一起幾乎一致
#平均耗時：7.505s  左右 
select id from order_table where company_id = 1 and mark =0 order by id desc limit 200000 ,1;
#平均耗時：9.092s  左右
select * from order_table where company_id = 1 and mark =0 and id <=12559073 order by id desc limit 200000;
#覆蓋索引獲取max => id<=max  -> 平均耗時：17.576s 左右 
select SQL_NO_CACHE * from order_table where company_id = 1 and mark =0 and id <= (select id from order_table where company_id = 1 and mark =0 order by id desc limit 200000 ,1) order by id desc limit 200000;
#覆蓋索引 + join ->平均耗時：11.325s 左右  
select SQL_NO_CACHE p.* from order_table p join (select id from order_table where company_id = 1 and mark =0 order by id desc limit 200000 ,200000) a on a.id = p.id;

性能分析說明

show profile CPU,SWAPS,BLOCK IO,MEMORY,SOURCE for query 520;

方式1.limit offset,size（含子查詢）

20-40萬

60-80萬

160-180萬

260-280萬

方式2.id < max and limit size

ps: 實戰中可以直接將上一頁的最小id 傳入到下一頁查詢中當max使用，從而節省子查詢的消耗。(后面再千萬級別的環境中就已省去子查詢)

20-40萬

60-80萬

160-180萬

260-280萬

方式3.覆蓋索引 + join

20-40萬

60-80萬

160-180萬

260-280萬

結論：

一.查詢導出百萬以內的數據

方式1->方式2、方式3 。效果不明顯:cpu 消耗與io消耗基本一樣：穩定在 30左右。優化后性能提升不明顯。

三種方式均可以使用，效果差異不大。百萬以內的數據沒必要優化。

二.查詢導出百萬以后的數據

方式1：其cpu與io消耗都顯著提升（28+->60+ 。offset 越大cpu與io消耗越大）

方式2與方式3：其cpu與io消耗不明顯基本穩定在 30左右。2，3兩種方式差異不大

方式1 可優化成：方式2 或方式3

其中覆蓋索引獲取起始id ：select id from order_table where xxx limit 2600000 ,1; 的耗時會隨着offset 的增加而增加。此種方式在查詢前200萬左右的數據時基本能在10s左右搞定，但是要查詢 500萬-600萬這區間數據時覆蓋索引的耗時顯著提升。

ps：之前pss 應付單查詢優化后就是采用的：覆蓋索引 + join 方式。

實戰2：數據量千萬級別

1.查詢導出百萬以內的數據（同上分析，不在重復）

2.查詢導出百萬甚至千萬以后的數據（利用 offset -> 起始id）

#僅僅查詢id 
#limt 100萬，1 耗時 0.671s
select id from order_table where company_id = 1 and mark =0 order by id desc limit 1000000 ,1; 
#limt 200萬，1 耗時 600.948s
select id from order_table where company_id = 1 and mark =0 order by id desc limit 2000000 ,1;
#limit 300萬+ 不在考慮 已超過650+s 極力不推薦
select id from order_table where company_id = 1 and mark =0 order by id desc limit 3000000 ,1;

limit 200萬，1 的性能分析

結論：

千萬級別的數據庫的數據查詢導出百萬甚至以后的數據,上述的三種方式均已不在使用,覆蓋索引僅僅查詢id就已耗時，耗cpu/IO 極其嚴重。需要使用后面的兩種方式。

3.導出千萬以后的數據

不在使用offset（方式3的升級版：省去子查詢）

#方式4 僅僅使用 id<max and limit size;
#每次查詢前獲取上一頁最小id作為下一頁的最大id使用 
##20萬-40萬:82959503-82620566  60萬-80萬:82334851   260萬-280萬:80106996-79887685 660萬-680萬:76010656-75810657 1660萬-1680萬:53482458-53240959 3660萬-3680萬:32532145-32332146
#首頁查詢
select * from order_table where company_id = 1 and mark =0 order by id desc limit 200000;
#非首頁查詢  
#平均耗時：1.539s
select * from order_table where company_id = 1 and mark =0 and id <=82543981 order by id desc limit 200000;

#方式5 使用 min<=id<=max
#每次查詢前獲取上一頁最小id 
#首頁
select * from order_table where company_id = 1 and mark =0 order by id desc limit 200000;
#非首頁
# 平均耗時：1.66s
select * from order_table where company_id = 1 and mark =0 and id>=82543981 and id <=82878478 order by id desc;