如何在 PostgreSQL 中通过排序删除固定数量的行?
我正在尝试将一些旧的 MySQL 查询移植到 PostgreSQL,但我在这个方面遇到了麻烦:
DELETE FROM logtable ORDER BY timestamp LIMIT 10;
PostgreSQL 不允许在其删除语法中进行排序或限制,并且该表没有主键,所以我可以'不要使用子查询。此外,我想保留查询完全删除给定数量或记录的行为 - 例如,如果表包含 30 行,但它们都具有相同的时间戳,我仍然想删除 10 行,尽管哪个 10 并不重要
。所以;如何在 PostgreSQL 中通过排序删除固定数量的行?
编辑:没有主键意味着没有 log_id
列或类似列。啊,遗留系统的乐趣!
I'm trying to port some old MySQL queries to PostgreSQL, but I'm having trouble with this one:
DELETE FROM logtable ORDER BY timestamp LIMIT 10;
PostgreSQL doesn't allow ordering or limits in its delete syntax, and the table doesn't have a primary key so I can't use a subquery. Additionally, I want to preserve the behavior where the query deletes exactly the given number or records -- for example, if the table contains 30 rows but they all have the same timestamp, I still want to delete 10, although it doesn't matter which 10.
So; how do I delete a fixed number of rows with sorting in PostgreSQL?
Edit: No primary key means there's no log_id
column or similar. Ah, the joys of legacy systems!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
您可以尝试使用
ctid
< /a>:db<>fiddle 演示
ctid
是:如果有继承的表 从
logtable
中,使用DELETE FROM ONLY
以防止从恰好托管具有相同ctid< 的行的每个后代中删除最多 10 行 /代码>。该字段仅在每个表中是唯一的,并且默认情况下,常规
DELETE
会沿层次结构向下级联。如果
logtable
已分区,请添加 < a href="https://www.postgresql.org/docs/current/ddl-system-columns.html#DDL-SYSTEM-COLUMNS-TABLEOID" rel="nofollow noreferrer">tableoid
< /a> 以阻止擦除每个分区最多 10 个数据:DELETE FROM ONLY
不会删除任何内容,因为主表是空的关系,仅重定向到特定分区。You could try using the
ctid
:demo at db<>fiddle
The
ctid
is:In case there are tables that inherit from
logtable
, useDELETE FROM ONLY
to prevent this from removing up to 10 rows from each of the descendants that happens to host a row with the samectid
. This field is only unique per table and by default, regularDELETE
cascades down the hierarchy.If
logtable
is partitioned, addtableoid
to stop this from wiping up to 10 per partition:DELETE FROM ONLY
wouldn't remove anything because the main table is an empty relation, only redirecting to specific partitions.Postgres 文档建议使用数组而不是 IN 和子查询。这应该工作得更快
这个和其他一些技巧可以在这里找到
Postgres docs recommend to use array instead of IN and subquery. This should work much faster
This and some other tricks can be found here
级别 1
甚至不必为 DELETE 的 LIMIT 所困扰。
你担心什么?你会把桌子锁起来吗?不,这不是 MySQL。只要被删除的行没有在其他事务中使用,您就不会阻止任何其他事务。
级别 2
好的好的,但有时您希望确保并行运行可能长时间运行的任务,或者在可能并行运行的 cron 作业中(例如,DELETE 作业花费的时间比周期长)预定)。为此,解决方案是:
今天就到此为止。如果这项工作花费的时间太长,下一项工作就会失败,但不会造成任何损害。您不会遇到死锁问题,就像其他地方的一些伪 LIMIT 解决方案一样。
级别 3
好的好的,但有时您真的只想删除一些,因为您很好奇可能会发生什么,或者因为您只想删除某些内容的前 X 并返回这些值。但绝对不是,因为您想将某些内容放入 cron-job 中...:
如果您将其放入 cronjob 或自动化任务中请结合 Level 2 && 3 级。否则陷入僵局,小魔怪会找到你并折磨你。
4 级
好的,但是您是一名认真 DBA/系统管理员,并且您了解得更多,对吗?您确实需要放置“好”而不是占用系统资源。例如,如果您正在执行 PIT,并且执行如此大的删除操作将会损害 WAL 系统。或者也许(现在越来越多的情况)您实际上需要按 CPU 时间付费,并且您希望保持性能曲线尽可能平滑。
我们可以使用 PL/PGSQL 创建一个循环来半自动化这项工作。
我们将调用该过程
batch
。要使用它,我们假设您有一个带有主键
id
和名为timestamp
的时间戳字段的表node
。您可以像这样删除行:这将一次删除 100 条记录,前提是这些记录早于 1 天。出于您将看到的重要原因,删除的计数将被“返回”。 (顺便说一句,我没有像有些人建议的那样对 ID 进行排序,因为效率低下,并且将表锁定在周围的块中更有意义。) ....
调用此
batchx
过程,对于上述查询,您将这样做:这将以 100 个批次(第二个参数)重复运行查询最多 300 秒(第一个参数),或者直到没有更多记录被删除。请注意使用
@@
而不是数字!这允许您参数化限制,这样您就不必每次都修改查询。 (这将在第 5 级中派上用场)。请注意:该代码替换了字符串
LIMIT @@
,包括前导空格。确保它不是换行符或制表符或其他东西。)第 5 级
这很酷,但现在是时候将其带到下一个也是最后一个级别:让过程根据性能自动调整限制。不要指定限制大小,而是指定您希望监视批次运行的时间。
我们将其称为“自动批处理”过程:
示例输出:
看看它如何调整限制大小以最大化批处理时间?如果超过,它将缩小到。放大的上限为 2 倍,缩小的上限为 1/10。这主要是为了稳定查询导致缓存命中时的行为。
Level 1
Don't even bother with a LIMIT with DELETE.
What are you worried about? That you'll lock up the table? Nope, this isn't MySQL. As long as a row that gets deleted isn't used in some other transaction, you'll not block anything else.
Level 2
OK OK, but sometimes you want to make sure you run potentially long-running tasks in parallel, or in cron-jobs that might run in parallel (eg, the DELETE job takes longer than the period scheduled). For that, the solution is:
and call it a day. If this job takes too long, the next job will simply fail noisily but no harm done. You won't run into deadlock issues, like you might with some of the pseudo-LIMIT solutions elsewhere.
Level 3
OK OK, but sometimes you really really only want to delete a few because you're curious what might happen, or because you only want to delete the top X of something and return those values. But definitely not because you want to put something in a cron-job... :
If you put this in cronjob or automated task please combine Level 2 && Level 3. Or deadlock-gremlins will find you and torture you.
Level 4
Right, but you're a serious DBA / Sysadmin and you know better, right? You really really need to place "nice" and not hog up system resources. For instance, if you're doing PIT and doing such a big delete operation will hammer the WAL system. Or maybe (as increasingly is the case these days) you actually get charged for CPU time, and you want to keep the performance curve as smooth as possible.
We can use PL/PGSQL to create a loop to semi-automate the job for us.
We'll call the procedure
batch
.To use it, let's imagine you've got a table
node
with a primary keyid
and with a timestamp-field namedtimestamp
. You would delete the rows like this:This will delete 100 records at a time, provided those records are older than 1 day. The count deleted will be "returned" for reasons that are important as you will see. (BTW, I'm not sorting the IDs here, as some suggest you should do, because it's inefficient, and locking the table in the surrounding block makes more sense.) ....
To invoke this
batchx
procedure, with the above query, you'll do it like this:This will run the query repeatedly in batches of 100 (second parameter) for up to 300 seconds (first parameter), or until no more records are deleted. Note the use of
@@
instead of the number! This allows you to parameterize the limit so that you don't have to modify the query each time. (This will come in handy in Level 5).Please note:the code replaces the string
LIMIT @@
including the leading space. Make sure it's not a newline or tab or something.)Level 5
That was cool and all, but now it's time to bring it to the next and final level: let the procedure auto-scale the limit according to the performance. Instead of specifying the limit-size, specify the time you want wach batch to run.
Let's call this the
autobatch
procedure:Sample output:
See how it adjusts the limit-size to maximize the batch-time? If it goes over, it will scale down to. Scaling up is capped at 2x, while scaling down is capped at 1/10th. That's mainly to stabilize behavior when queries result in cache hits.
如果没有主键,可以将数组Where IN 语法与组合键一起使用。
这对我有用。
If you don't have a primary key you can use the array Where IN syntax with a composite key.
This worked for me.
假设您想要删除任何 10 条记录(没有排序),您可以这样做:
对于我的用例,删除 10M 记录,结果证明这更快。
Assuming you want to delete ANY 10 records (without the ordering) you could do this:
For my use case, deleting 10M records, this turned out to be faster.
您可以编写一个循环删除各行的过程,该过程可以采用一个参数来指定要删除的项目数。但与 MySQL 相比,这有点大材小用了。
You could write a procedure which loops over the delete for individual lines, the procedure could take a parameter to specify the number of items you want to delete. But that's a bit overkill compared to MySQL.