当前位置：文江博客话题详情

PostgreSQL 无间隙序列

发布于 2025-01-14 22:41:28 字数 169 浏览 1 评论 0原文

我正在从 MySql 迁移到 Postgres，我注意到当您从 MySql 中删除行时，这些行的唯一 id 在您创建新行时将被重新使用。使用 Postgres，如果您创建行并删除它们，则不会再次使用唯一的 id。

Postgres 中出现这种行为有原因吗？在这种情况下我可以让它表现得更像 MySql 吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦境 2025-01-21 22:41:28

序列有间隙以允许并发插入。尝试避免间隙或重新使用已删除的 ID 会产生可怕的性能问题。请参阅 PostgreSQL wiki 常见问题解答。

PostgreSQL SEQUENCEs 用于分配ID。这些只会不断增加，并且不受通常的事务回滚规则的约束，以允许多个事务同时获取新的 ID。这意味着如果事务回滚，这些 ID 将被“丢弃”；没有保留“免费”ID 列表，只有当前的 ID 计数器。如果数据库非正常关闭，序列通常也会增加。

无论如何，合成密钥（ID）毫无意义。它们的顺序并不重要，它们唯一重要的属性是唯一性。您无法有意义地测量两个 ID 之间的“距离”有多远，也无法有意义地判断一个 ID 是否大于或小于另一个 ID。你所能做的就是说“等于”或“不等于”。其他任何事情都是不安全的。你不应该关心差距。

如果您需要一个重复使用已删除 ID 的无缝序列，您可以拥有一个，只需为此放弃大量性能 - 特别是，您不能在 INSERT 上实现任何并发根本没有，因为您必须扫描表以查找最低的可用 ID，锁定表以进行写入，这样其他事务就无法声明相同的 ID。尝试搜索“postgresql 无间隙序列”。

最简单的方法是使用计数器表和获取下一个 ID 的函数。这是一个通用版本，它使用计数器表来生成连续的无间隙 ID；但它不会重复使用 ID。

CREATE TABLE thetable_id_counter ( last_id integer not null );
INSERT INTO thetable_id_counter VALUES (0);

CREATE OR REPLACE FUNCTION get_next_id(countertable regclass, countercolumn text) RETURNS integer AS $
DECLARE
    next_value integer;
BEGIN
    EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
    RETURN next_value;
END;
$ LANGUAGE plpgsql;

COMMENT ON get_next_id(countername regclass) IS 'Increment and return value from integer column $2 in table $1';

用法：

INSERT INTO dummy(id, blah) 
VALUES ( get_next_id('thetable_id_counter','last_id'), 42 );

请注意，当一个打开的事务获得 ID 时，尝试调用 get_next_id 的所有其他事务将阻塞，直到第一个事务提交或回滚。对于无间隙 ID，这是不可避免的，并且是设计使然。

如果你想在一个表中存储多个用于不同目的的计数器，只需在上述函数中添加一个参数，在计数器表中添加一列，并在 UPDATE< 中添加一个 WHERE 子句即可/code> 将参数与添加的列相匹配。这样您就可以拥有多个独立锁定的计数器行。不要只为新计数器添加额外的列。

此功能不会重新使用已删除的 ID，它只是避免引入间隙。

要重复使用 ID，我建议...不要重复使用 ID。

如果确实必须这样做，您可以通过在感兴趣的表上添加 ON INSERT OR UPDATE OR DELETE 触发器来实现此目的，该触发器将已删除的 ID 添加到空闲列表侧表，并将它们从空闲列表中删除。在INSERT编辑时列出表格。将UPDATE 视为DELETE 后跟INSERT。现在修改上面的 ID 生成函数，使其执行 SELECT free_id INTO next_value FROM free_ids FOR UPDATE LIMIT 1 操作，如果找到，则DELETE该行。 IF NOT FOUND 照常从生成器表中获取新 ID。这是先前函数的未经测试的扩展，以支持重用：

CREATE OR REPLACE FUNCTION get_next_id_reuse(countertable regclass, countercolumn text, freelisttable regclass, freelistcolumn text) RETURNS integer AS $
DECLARE
    next_value integer;
BEGIN
    EXECUTE format('SELECT %I FROM %s FOR UPDATE LIMIT 1', freelistcolumn, freelisttable) INTO next_value;
    IF next_value IS NOT NULL THEN
        EXECUTE format('DELETE FROM %s WHERE %I = %L', freelisttable, freelistcolumn, next_value);
    ELSE
        EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
    END IF;
    RETURN next_value;
END;
$ LANGUAGE plpgsql;

Sequences have gaps to permit concurrent inserts. Attempting to avoid gaps or to re-use deleted IDs creates horrible performance problems. See the PostgreSQL wiki FAQ.

PostgreSQL SEQUENCEs are used to allocate IDs. These only ever increase, and they're exempt from the usual transaction rollback rules to permit multiple transactions to grab new IDs at the same time. This means that if a transaction rolls back, those IDs are "thrown away"; there's no list of "free" IDs kept, just the current ID counter. Sequences are also usually incremented if the database shuts down uncleanly.

Synthetic keys (IDs) are meaningless anyway. Their order is not significant, their only property of significance is uniqueness. You can't meaningfully measure how "far apart" two IDs are, nor can you meaningfully say if one is greater or less than another. All you can do is say "equal" or "not equal". Anything else is unsafe. You shouldn't care about gaps.

If you need a gapless sequence that re-uses deleted IDs, you can have one, you just have to give up a huge amount of performance for it - in particular, you cannot have any concurrency on INSERTs at all, because you have to scan the table for the lowest free ID, locking the table for write so no other transaction can claim the same ID. Try searching for "postgresql gapless sequence".

The simplest approach is to use a counter table and a function that gets the next ID. Here's a generalized version that uses a counter table to generate consecutive gapless IDs; it doesn't re-use IDs, though.

CREATE TABLE thetable_id_counter ( last_id integer not null );
INSERT INTO thetable_id_counter VALUES (0);

CREATE OR REPLACE FUNCTION get_next_id(countertable regclass, countercolumn text) RETURNS integer AS $
DECLARE
    next_value integer;
BEGIN
    EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
    RETURN next_value;
END;
$ LANGUAGE plpgsql;

COMMENT ON get_next_id(countername regclass) IS 'Increment and return value from integer column $2 in table $1';

Usage:

INSERT INTO dummy(id, blah) 
VALUES ( get_next_id('thetable_id_counter','last_id'), 42 );

Note that when one open transaction has obtained an ID, all other transactions that try to call get_next_id will block until the 1st transaction commits or rolls back. This is unavoidable and for gapless IDs and is by design.

If you want to store multiple counters for different purposes in a table, just add a parameter to the above function, add a column to the counter table, and add a WHERE clause to the UPDATE that matches the parameter to the added column. That way you can have multiple independently-locked counter rows. Do not just add extra columns for new counters.

This function does not re-use deleted IDs, it just avoids introducing gaps.

To re-use IDs I advise ... not re-using IDs.

If you really must, you can do so by adding an ON INSERT OR UPDATE OR DELETE trigger on the table of interest that adds deleted IDs to a free-list side table, and removes them from the free-list table when they're INSERTed. Treat an UPDATE as a DELETE followed by an INSERT. Now modify the ID generation function above so that it does a SELECT free_id INTO next_value FROM free_ids FOR UPDATE LIMIT 1 and if found, DELETEs that row. IF NOT FOUND gets a new ID from the generator table as normal. Here's an untested extension of the prior function to support re-use:

CREATE OR REPLACE FUNCTION get_next_id_reuse(countertable regclass, countercolumn text, freelisttable regclass, freelistcolumn text) RETURNS integer AS $
DECLARE
    next_value integer;
BEGIN
    EXECUTE format('SELECT %I FROM %s FOR UPDATE LIMIT 1', freelistcolumn, freelisttable) INTO next_value;
    IF next_value IS NOT NULL THEN
        EXECUTE format('DELETE FROM %s WHERE %I = %L', freelisttable, freelistcolumn, next_value);
    ELSE
        EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
    END IF;
    RETURN next_value;
END;
$ LANGUAGE plpgsql;

回复收藏 0 原文

~没有更多了~