如何快速修剪大表?

发布于 2024-09-09 04:06:05 字数 206 浏览 8 评论 0原文

我目前有一个大约 2000 万行的 MySQL 表,我需要对其进行修剪。我想删除 updateTime (插入时间戳)超过一个月前的每一行。我个人没有对表的顺序进行任何更改,因此数据应该按照插入的顺序排列,并且两个字段上有一个 UNIQUE 键,id更新时间。我该如何在短时间内做到这一点?

I currently have a MySQL table of about 20 million rows, and I need to prune it. I'd like to remove every row whose updateTime (timestamp of insertion) was more than one month ago. I have not personally performed any alterations of the table's order, so the data should be in the order in which it was inserted, and there is a UNIQUE key on two fields, id and updateTime. How would I go about doing this in a short amount of time?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

橙幽之幻 2024-09-16 04:06:05

您可以承受多少停机时间?行有多大?你要删除多少个?

简而言之,删除行是对表执行的最昂贵的操作之一。总的来说,这是一件可怕的事情。

如果您不必这样做,并且您有足够的磁盘空间,并且您的查询不受表大小的影响(索引良好的查询通常会忽略表大小),那么您可以不用管它。

如果您有机会并且可以使表脱机(并且您要删除表的很大一部分),那么最好的选择是将要保留的行复制到新表,删除旧表,重命名将新名称更改为旧名称,然后重新创建索引。

否则,您将几乎陷入良好的删除困境。

How much down time can you incur? How big are the rows? How many are you deleting?

Simply put, deleting rows is one of the most expensive things you can do to a table. It's just a horrible thing overall.

If you don't have to do it, and you have the disk space for it, and your queries aren't affected by the table size (well indexed queries typically ignore table size), then you may just leave well enough alone.

If you have the opportunity and can take the table offline (and you're removing a good percentage of the table), then your best bet would be to copy the rows you want to keep to a new table, drop the old one, rename the new one to the old name, and THEN recreate your indexes.

Otherwise, you're pretty much stuck with good 'ol delete.

热鲨 2024-09-16 04:06:05

有两种方法可以删除大量行。首先是明显的方法:

DELETE FROM table1 WHERE updateTime < NOW() - interval 1 month;

第二种(稍微复杂一些)方法是创建一个新表并复制要保留的数据,截断旧表,然后将行复制回来。

CREATE TABLE table2 AS
SELECT * FROM table1 WHERE updateTime >= NOW() - interval 1 month;

TRUNCATE table1;

INSERT INTO table1
SELECT * FROM table2;

当您要删除大量行且要删除的行数相对较少时,使用 TRUNCATE 比使用带有 WHERE 子句的 DELETE 快得多。希望保留。

There are two ways to remove a large number of rows. First there is the obvious way:

DELETE FROM table1 WHERE updateTime < NOW() - interval 1 month;

The second (slightly more complicated) way is to create a new table and copy the data that you want to keep, truncate your old table, then copy the rows back.

CREATE TABLE table2 AS
SELECT * FROM table1 WHERE updateTime >= NOW() - interval 1 month;

TRUNCATE table1;

INSERT INTO table1
SELECT * FROM table2;

Using TRUNCATE is much faster than a DELETE with a WHERE clause when you have a large number of rows to delete and a relatively small number that you wish to keep.

无敌元气妹 2024-09-16 04:06:05

有限制地分割删除可能会加快该过程;

我必须删除 10M 行,然后我发出了命令。几个小时以来它都没有回应。

我终止了查询(花了几个小时)

然后拆分删除。

DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;

然后我在文件中复制了这条语句并使用了该命令。

mysql> source /tmp/delete.sql 

这要快得多。

您还可以尝试使用 pt-tools 等工具。和 pt-archiver。

Spliting the deletes with limit might speed up the process;

I had to delete 10M rows and i issued the command. It never responded for hours.

I killed the query ( which took couple of hours)

then Split the deletes.

DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;
DELETE from table where id > XXXX limit 10000;

Then i duplicated this statement in a file and used the command.

mysql> source /tmp/delete.sql 

This was much faster.

You can also try to use tools like pt-tools. and pt-archiver.

谈场末日恋爱 2024-09-16 04:06:05

实际上,即使您不能长时间使表离线,您仍然可以使用“重命名表”技术来删除旧数据。

停止写入表的进程。

rename table tableName to tmpTableName;
create table tableName like tmpTableName;
set @currentId=(select max(id) from tmpTableName);
set @currentId=@currentId+1;
set @indexQuery = CONCAT("alter table test auto_increment = ", @currentId);
prepare stmt from @indexQuery;
execute stmt;
deallocate prepare stmt;

启动进程写入表。

insert into tableName
select * from tmpTableName;
drop table;

对 tableName 的新插入将从正确的索引开始;旧数据将被插入到正确的索引中。

Actually even if you can't take the table offline for long, you can still use the 'rename table' technique to get rid of old data.

Stop processes writting to table.

rename table tableName to tmpTableName;
create table tableName like tmpTableName;
set @currentId=(select max(id) from tmpTableName);
set @currentId=@currentId+1;
set @indexQuery = CONCAT("alter table test auto_increment = ", @currentId);
prepare stmt from @indexQuery;
execute stmt;
deallocate prepare stmt;

Start processes writting to table.

insert into tableName
select * from tmpTableName;
drop table;

New inserts to tableName will begin at the correct index; The old data will be inserted in correct indexes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文