原文地址:http://mysql.rjweb.org/doc.php/deletebig
Table of Contents
The ProblemWhy it is a Problem
InnoDB and undo
Solutions
PARTITION
Deleting in Chunks
InnoDB Chunking Recommendation
Iterating through a compound key
Reclaiming the disk space
Deleting more than half a table
Non-deterministic Replication
Replication and KILL
SBR vs RBR; Galera
Postlog
Brought to you by Rick James
The Problem
How to DELETE lots of rows from a large table? Here is an example of purging items older than 30 days:
DELETE FROM tbl WHERE ts < CURRENT_DATE() - INTERVAL 30 DAY
If there are millions of rows in the table, this statement may take minutes, maybe hours. 幾百萬行數據。需要幾分鍾甚至幾小時。
Any suggestions on how to speed this up?
Why it is a Problem
⚈ MyISAM will lock the table during the entire operation, thereby nothing else can be done with the table.
⚈ InnoDB won't lock the table, but it will chew up a lot of resources, leading to sluggishness.
⚈ InnoDB has to write the undo information to its transaction logs; this significantly increases the I/O required.
⚈ Replication, being asynchronous, will effectively be delayed (on Slaves) while the DELETE is running.
⚈MyISAM將在整個操作過程中鎖定表格,因此表格無法完成任何其他操作。
⚈InnoDB不會鎖定表格,但它會占用大量資源,導致遲緩。
⚈InnoDB必須將撤消信息寫入其事務日志; 這顯着增加了所需的I / O.
⚈在DELETE運行時,異步復制將在(Slaves)上有效延遲。
InnoDB and undo
To be ready for a crash, a transactional engine such as InnoDB will record what it is doing to a log file. To make that somewhat less costly, the log file is sequentially written. If the log files you have (there are usually 2)
fill up because the delete is really big, then the undo information spills into the actual data blocks, leading to even more I/O.
Deleting in chunks avoids some of this excess overhead.
Limited benchmarking of total delete elapsed time shows two observations:
⚈ Total delete time approximately doubles above some 'chunk' size (as opposed to below that threshold). I do not have a formula relating the log file size with the threshold cutoff.
⚈ Chunk size below several hundred rows is slower. This is probably because the overhead of starting/ending each chunk dominates the timing.
Solutions
⚈ PARTITION -- Requires 5.1 and some careful setup, but is excellent for purging a time-base series.
⚈ DELETE in chunks -- Carefully walk through the table N rows at a time.
⚈ 分區 - 需要5.1和一些精心設置,但非常適合清除時基系列。
⚈ 以chunk的形式 刪除 - 一次小心地遍歷表格N行。
PARTITION
The idea here is to have a sliding window of partitions. Let's say you need to purge news articles after 30 days. The "partition key" would be the datetime (or timestamp) that is to be used for purging, and the PARTITIONs would be BY RANGE. Every night, a cron job would come along and build a new partition for the next day, and drop the oldest partition.
Dropping a partition is essentially instantaneous, much faster than deleting that many rows. However, you must design the table so that the entire partition can be dropped. That is, you cannot have some items in a partition living longer than others.
PARTITION tables have a lot of restrictions, some are rather weird. You can either have no UNIQUE (or PRIMARY) key on the table, or every UNIQUE key must include the partition key. In this use case, the partition key is the datetime. It should not be the first part of the PRIMARY KEY (if you have a PRIMARY KEY).
You can PARTITION InnoDB tables. (Before Version 8.0, you could also partition MyISAM tables.)
Since two news articles could have the same timestamp, you cannot assume the partition key is sufficient for uniqueness of the PRIMARY KEY, so you need to find something else to help with that.
這里的想法是有一個分區的滑動窗口。假設您需要在30天后清除新聞文章。“分區鍵”將是用於清除的日期時間(或時間戳),PARTITIONs將是BY RANGE。每天晚上,一個cron作業會出現並為第二天構建一個新的分區,並刪除最舊的分區。
刪除分區基本上是即時的,比刪除那么多行要快得多。但是,您必須設計表,以便可以刪除整個分區。也就是說,您不能讓分區中的某些項目比其他項目更長。
PARTITION表有很多限制,有些是相當奇怪的。您可以在表上沒有UNIQUE(或PRIMARY)鍵,或者每個UNIQUE鍵都必須包含分區鍵。在此用例中,分區鍵是日期時間。它不應該是PRIMARY KEY的第一部分(如果你有一個PRIMARY KEY)。
你可以PARTITION InnoDB表。(在8.0之前,您還可以對MyISAM表進行分區。)
由於兩篇新聞文章可能具有相同的時間戳,因此您不能假設分區鍵足以滿足PRIMARY KEY的唯一性,因此您需要找到其他內容來幫助解決這個問題。
分區維護
PARTITIONing的參考實現需要MySQL 5.1。 關於PARTITION的MySQL文檔
Reference implementation for Partition maintenance
PARTITIONing requires MySQL 5.1. MySQL docs on PARTITION
Deleting in Chunks
Although the discussion in this section talks about DELETE, it can be used for any other "chunking", such as, say, UPDATE, or SELECT plus some complex processing.
(This discussion applies to both MyISAM and InnoDB.)
When deleting in chunks, be sure to avoid doing a table scan. Also be sure to avoid OFFSET and LIMIT. The code below is good at that; it scans no more than 1001 rows in any one query. (The 1000 is tunable.)
在以塊的形式刪除時,請務必避免進行表掃描。另外一定要避免OFFSET和LIMIT。下面的代碼很擅長; 它在任何一個查詢中掃描不超過1001行。(1000是可調的。)
Assuming you have news articles that need to be purged, and you have a schema something like
CREATE TABLE tbl id INT UNSIGNED NOT NULL AUTO_INCREMENT, ts TIMESTAMP, ... PRIMARY KEY(id)
Then, this pseudo-code is a good way to delete the rows older than 30 days:
然后,這個偽代碼是刪除超過30天的行的好方法
@a = 0 LOOP DELETE FROM tbl WHERE id BETWEEN @a AND @a+999 AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) SET @a = @a + 1000 sleep 1 -- be a nice guy UNTIL end of table
Notes (Most of these caveats will be covered later):
⚈ It uses the PK instead of the secondary key. This gives much better locality of disk hits, especially for InnoDB.
⚈ You could (should?) do something to avoid walking through recent days but doing nothing. Caution -- the code for this could be costly.
⚈ The 1000 should be tweaked so that the DELETE usually takes under, say, one second.
⚈ No INDEX on ts is needed. (This helps INSERTs a little.)
⚈ If your PRIMARY KEY is compound, the code gets messier. (a fix is below)
⚈ This code will not work without a numeric PRIMARY or UNIQUE key. (a fix is below)
⚈ Read on, we'll develop messier code to deal with most of these caveats.
注釋(大多數警告將在后面介紹):
⚈它使用PK而不是輔助鍵。這提供了更好的磁盤命中位置,特別是對於InnoDB。
⚈你可以(應該?)做些什么來避免走近最近幾天但什么都不做。注意 - 此代碼可能代價高昂。
⚈應該調整1000,以便DELETE通常需要一秒鍾。
⚈不需要關於ts的索引。(這有助於INSERTs。)
⚈如果您的PRIMARY KEY是復合的,代碼會變得更加混亂。(修復如下)
⚈如果沒有數字PRIMARY或UNIQUE鍵,此代碼將無法工作。(修正如下)
⚈繼續閱讀,我們將開發更復雜的代碼來處理大多數這些警告。
If there are big gaps in id values (and there will after the first purge), then
@a = SELECT MIN(id) FROM tbl LOOP SELECT @z := id FROM tbl WHERE id >= @a ORDER BY id LIMIT 1000,1 If @z is null exit LOOP -- last chunk DELETE FROM tbl WHERE id >= @a AND id < @z AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) SET @a = @z sleep 1 -- be a nice guy, especially in replication ENDLOOP # Last chunk: DELETE FROM tbl WHERE id >= @a AND ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
That code works whether id is numeric or character, and it mostly works even if id is not UNIQUE. With a non-unique key, the risk is that you could be caught in a loop whenever @z==@a. That can be detected and fixed thus:
... SELECT @z := id FROM tbl WHERE id >= @a ORDER BY id LIMIT 1000,1 If @z == @a SELECT @z := id FROM tbl WHERE id > @a ORDER BY id LIMIT 1 ...
The drawback is that there could be more than 1000 items with a single id. In most practical cases, that is unlikely.
If you do not have a primary (or unique) key defined on the table, and you have an INDEX on ts, then consider
LOOP DELETE FROM tbl WHERE ts < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) ORDER BY ts -- to use the index, and to make it deterministic LIMIT 1000 UNTIL no rows deleted
This technique is NOT recommended because the LIMIT leads to a warning on replication about it being non-deterministic (discussed below).
InnoDB Chunking Recommendation
⚈ Have a 'reasonable' size for innodb_log_file_size.
⚈ Use AUTOCOMMIT=1 for the session doing the deletions.
⚈ Pick about 1000 rows for the chunk size.
⚈ Adjust the row count down if asynchronous replication (Statement Based) causes too much delay on the Slaves or hogs the table too much.
⚈對innodb_log_file_size有一個“合理”的大小。
⚈對執行刪除的會話使用AUTOCOMMIT = 1。
⚈為塊大小選擇大約1000行。
if如果異步復制(基於語句)導致Slaves上的延遲太多或者占用過多的表,則調整行數。
Iterating through a compound key
To perform the chunked deletes recommended above, you need a way to walk through the PRIMARY KEY. This can be difficult if the PK has more than one column in it.
To efficiently to do compound 'greater than':
要執行上面推薦的分塊刪除,您需要一種方法來遍歷PRIMARY KEY。如果PK中有多個列,這可能很困難。
要有效地復合'大於':
Assume that you left off at ($g, $s) (and have handled that row):
INDEX(Genus, species) SELECT/DELETE ... WHERE Genus >= '$g' AND ( species > '$s' OR Genus > '$g' ) ORDER BY Genus, species LIMIT ...
Addenda: The above AND/OR works well in older versions of MySQL; this works better in newer versions:
WHERE ( Genus = '$g' AND species > '$s' ) OR Genus > '$g' )
A caution about using @variables for strings. If, instead of '$g', you use @g, you need to be careful to make sure that @g has the same CHARACTER SET and COLLATION as Genus, else there could be a charset/collation conversion on the fly that prevents the use of the INDEX. Using the INDEX is vital for performance. It may require a COLLATE clause on SET NAMES and/or the @g in the SELECT.
關於在字符串中使用@variables的注意事項。如果你使用@g代替'$ g',你需要小心確保@g具有相同的CHARACTER SET和COLLATION作為Genus,否則可能會有動態的charset / collation轉換阻止使用INDEX。使用INDEX對性能至關重要。它可能需要SET NAMES上的COLLATE子句和/或SELECT中的@g。
Reclaiming the disk space
Note: Reclaiming disk space may not be necessary. After all, tomorrow's INSERTs will simply reuse the free space in the table.
注意:可能不需要回收磁盤空間。畢竟,明天的INSERT將簡單地重用表中的空閑空間。
This is costly. (Switch to the PARTITION solution if practical.)
MyISAM leaves gaps in the table (.MYD file); OPTIMIZE TABLE will reclaim the freed space after a big delete. But it may take a long time and lock the table.
MyISAM在表中留下空白(.MYD文件); OPTIMIZE TABLE將在大刪除后回收釋放的空間。但它可能需要很長時間才能鎖定表格。
InnoDB is block-structured, organized in a BTree on the PRIMARY KEY. An isolated deleted row leaves a block less full. A lot of deleted rows can lead to coalescing of adjacent blocks. (Blocks are normally 16KB.)
In InnoDB, there is no practical way to reclaim the freed space from ibdata1, other than to reuse the freed blocks eventually.
If you have innodb_file_per_table = 0, the only option is to dump ALL tables, remove ibdata*, restart, and reload. That is rarely worth the effort and time.
InnoDB, even with innodb_file_per_table = 1, won't give space back to the OS, but at least it is only one table to rebuild with. In this case, something like this should work:
CREATE TABLE new LIKE main; INSERT INTO new SELECT * FROM main; -- This could take a long time RENAME TABLE main TO old, new TO main; -- Atomic swap DROP TABLE old; -- Space freed up here
You do need enough disk space for both copies. You must not write to the table during the process.
Deleting more than half a table
The following technique can be used for any combination of
⚈ Deleting a large portion of the table more efficiently
⚈ Add PARTITIONing
⚈ Converting to innodb_file_per_table = ON
⚈ Defragmenting
This can be done by chunking, or (if practical) all at once:
-- Optional: SET GLOBAL innodb_file_per_table = ON;
CREATE TABLE New LIKE Main;
-- Optional: ALTER TABLE New ADD PARTITION BY RANGE ...;
-- Do this INSERT..SELECT all at once, or with chunking:
INSERT INTO New
SELECT * FROM Main
WHERE ...; -- just the rows you want to keep
RENAME TABLE main TO Old, New TO Main;
DROP TABLE Old; -- Space freed up here
Notes:
⚈ You do need enough disk space for both copies.
⚈ You must not write to the table during the process. (Changes to Main may not be reflected in New.)
Non-deterministic Replication 非確定性復制
Any UPDATE, DELETE, etc with LIMIT that is replicated to slaves (via Statement Based Replication) _may_ cause inconsistencies between the Master and Slaves. This is because the actual order of the records discovered for updating/deleting may be different on the slave, thereby leading to a different subset being modified. To be safe, add ORDER BY to such statements. Moreover, be sure the ORDER BY is deterministic -- that is, the fields/expressions in the ORDER BY are unique.
任何帶有LIMIT的UPDATE,DELETE等復制到從屬(通過基於語句的復制)_may_導致主服務器和從服務器之間的不一致。這是因為在從屬設備上發現的用於更新/刪除的記錄的實際順序可能不同,從而導致修改不同的子集。為安全起見,請將ORDER BY添加到此類語句中。此外,確保ORDER BY是確定性的 - 也就是說,ORDER BY中的字段/表達式是唯一的。
An example of an ORDER BY that does not quite work: Assume there are multiple rows for each 'date':
DELETE * FROM tbl ORDER BY date LIMIT 111
Given that id is the PRIMARY KEY (or UNIQUE), this will be safe:
DELETE * FROM tbl ORDER BY date, id LIMIT 111
Unfortunately, even with the ORDER BY, MySQL has a deficiency that leads to a bogus warning in mysqld.err. See Spurious "Statement is not safe to log in statement format." warnings
Some of the above code avoids this spurious warning by doing
SELECT @z := ... LIMIT 1000,1; -- not replicated DELETE ... BETWEEN @a AND @z; -- deterministic
That pair of statements guarantees no more than 1000 rows are touched, not the whole table.
Replication and KILL
If you KILL a DELETE (or any? query) on the Master in the middle of its execution, what will be Replicated?
If it is InnoDB, the query should be rolled back. (Exceptions??)
In MyISAM, rows are DELETEd as the statement is executed, and there is no provision for ROLLBACK. Some of the rows will be deleted, some won't. You probably have no clue of how much was deleted. In a single server, simply run the delete again. The delete is put into the binlog, but with error 1317. Since Replication is supposed to keep the Master and Slave in sync, and since it has no clue of how to do that, Replication stops and waits for manual intervention. In a HA (High Available) system using Replication, this is a minor disaster. Meanwhile, you need to go to each Slave(s) and verify that it is stuck for this reason, then do
SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;
Then (presumably) reexecuting the DELETE will finish the aborted task.
(That is yet another reason to move all your tables from MyISAM to InnoDB.)
SBR vs RBR; Galera
"Row Based Replication" implies that the rows to be deleted are written to the binlog. The bigger the rows, and the more rows that you delete in a single "chunk", the more replication will be impacted. The suggestion of "1000" rows per chunks may need to be adjusted. The tradeoff is between how soon all the chunks are finished versus how much impact each chunk has on other things going on in replication.
If the task is to "purge old data", then speed of completion is probably not important.
“基於行的復制”意味着要刪除的行將寫入binlog。行越大,並且在單個“塊”中刪除的行越多,將影響的復制越多。可能需要調整每塊“1000”行的建議。權衡取決於所有塊完成的時間與每個塊對復制中其他事件的影響程度。
如果任務是“清除舊數據”,那么完成速度可能並不重要。
Postlog
The tips in this document apply to MySQL, MariaDB, and Percona.
Chunking via Common Schema
Similar - from OAK
Percona's package to do big deletes, etc: pt-archiver
Anecdote: 2 hours vs 5 days
Posted: 2010; Refreshed: June, 2015; Minor Refresh: Sep, 2017
-- Rick James
MySQL Documents by Rick James
HowTo Techniques for Optimizing Tough Tasks:
Partition Maintenance (DROP+REORG) for time series (includes list of PARTITION uses)
Big DELETEs - how to optimize -- and other chunking advice, plus a use for PARTITIONing
Chunking lengthy DELETE/UPDATE/etc.
Data Warehouse techniques:
Overview Summary Tables High speed ingestion
Entity-Attribute-Value -- a common, poorly performing, design pattern (EAV); plus an alternative
Find the nearest 10 pizza parlors -- efficient searching on Latitude + Longitude (another PARITION use)
Lat/Long representation choices
Pagination, not with OFFSET, LIMIT
Techniques on efficiently finding a random row (On beyond ORDER BY RAND())
GUID/UUID Performance (type 1 only)
IP Range Table Performance -- or other disjoint ranges
Rollup Unique User Counts
Alter of a Huge table -- Mostly obviated by 5.6
Latest 10 news articles -- how to optimize the schema and code for such
Build and execute a "Pivot" SELECT (showing rows as columns)
Find largest row for each group ("groupwise max")
Other Tips, Tuning, Debugging, Optimizations, etc...
Rick's RoTs (Rules of Thumb -- lots of tips)
Memory Allocation (caching, etc)
Character Set and Collation problem solver
Trouble with UTF-8 If you want case folding, but accent sensitivity, please file a request at http://bugs.mysql.com .
Python tips, PHP tips, other language tips
utf8 Collations utf8mb4 Collations on 8.0
Converting from MyISAM to InnoDB -- includes differences between them
Compound INDEXes plus other insights into the mysteries of INDEXing
Cookbook for Creating Indexes
Many-to-many mapping table wp_postmeta UNION+OFFSET
MySQL Limits -- built-in hard limits
767-byte INDEX limit
Galera, tips on converting to (Percona XtraDB Cluster, MariaDB 10, or manually installed)
5.7's Query Rewrite -- perhaps 5.7's best perf gain, at least for this forum's users
Request for tuning / slowlog info
Best of MySQL Forum -- index of lots of tips, discussions, etc
Analyze MySQL Performance
Analyze VARIABLEs and GLOBAL STATUS Analyze SlowLog
My slides from conferences
Percona Live 4/2017 - Rick's RoTs (Rules of Thumb) - MySQL/MariaDB
Percona Live 4/2017 - Index Cookbook - MySQL/MariaDB
Percona Live 9/2015 - PARTITIONing - MySQL/MariaDB
(older ones upon request)