如何快速修剪大表?
我目前有一个大约 2000 万行的 MySQL 表,我需要对其进行修剪。我想删除 updateTime
(插入时间戳)超过一个月前的每一行。我个人没有对表的顺序进行任何更改,因此数据应该按照插入的顺序排列,并且两个字段上有一个 UNIQUE
键,id
和更新时间
。我该如何在短时间内做到这一点?
I currently have a MySQL table of about 20 million rows, and I need to prune it. I'd like to remove every row whose updateTime
(timestamp of insertion) was more than one month ago. I have not personally performed any alterations of the table's order, so the data should be in the order in which it was inserted, and there is a UNIQUE
key on two fields, id
and updateTime
. How would I go about doing this in a short amount of time?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以承受多少停机时间?行有多大?你要删除多少个?
简而言之,删除行是对表执行的最昂贵的操作之一。总的来说,这是一件可怕的事情。
如果您不必这样做,并且您有足够的磁盘空间,并且您的查询不受表大小的影响(索引良好的查询通常会忽略表大小),那么您可以不用管它。
如果您有机会并且可以使表脱机(并且您要删除表的很大一部分),那么最好的选择是将要保留的行复制到新表,删除旧表,重命名将新名称更改为旧名称,然后重新创建索引。
否则,您将几乎陷入良好的删除困境。
How much down time can you incur? How big are the rows? How many are you deleting?
Simply put, deleting rows is one of the most expensive things you can do to a table. It's just a horrible thing overall.
If you don't have to do it, and you have the disk space for it, and your queries aren't affected by the table size (well indexed queries typically ignore table size), then you may just leave well enough alone.
If you have the opportunity and can take the table offline (and you're removing a good percentage of the table), then your best bet would be to copy the rows you want to keep to a new table, drop the old one, rename the new one to the old name, and THEN recreate your indexes.
Otherwise, you're pretty much stuck with good 'ol delete.
有两种方法可以删除大量行。首先是明显的方法:
第二种(稍微复杂一些)方法是创建一个新表并复制要保留的数据,截断旧表,然后将行复制回来。
当您要删除大量行且要删除的行数相对较少时,使用 TRUNCATE 比使用带有
WHERE
子句的DELETE
快得多。希望保留。There are two ways to remove a large number of rows. First there is the obvious way:
The second (slightly more complicated) way is to create a new table and copy the data that you want to keep, truncate your old table, then copy the rows back.
Using
TRUNCATE
is much faster than aDELETE
with aWHERE
clause when you have a large number of rows to delete and a relatively small number that you wish to keep.有限制地分割删除可能会加快该过程;
我必须删除 10M 行,然后我发出了命令。几个小时以来它都没有回应。
我终止了查询(花了几个小时)
然后拆分删除。
然后我在文件中复制了这条语句并使用了该命令。
这要快得多。
您还可以尝试使用 pt-tools 等工具。和 pt-archiver。
Spliting the deletes with limit might speed up the process;
I had to delete 10M rows and i issued the command. It never responded for hours.
I killed the query ( which took couple of hours)
then Split the deletes.
Then i duplicated this statement in a file and used the command.
This was much faster.
You can also try to use tools like pt-tools. and pt-archiver.
实际上,即使您不能长时间使表离线,您仍然可以使用“重命名表”技术来删除旧数据。
停止写入表的进程。
启动进程写入表。
对 tableName 的新插入将从正确的索引开始;旧数据将被插入到正确的索引中。
Actually even if you can't take the table offline for long, you can still use the 'rename table' technique to get rid of old data.
Stop processes writting to table.
Start processes writting to table.
New inserts to tableName will begin at the correct index; The old data will be inserted in correct indexes.