如何创建 PostgreSQL 分区序列?

发布于 2024-09-16 20:16:37 字数 379 浏览 8 评论 0原文

是否有一种简单(即非 hacky)且无竞争条件的方法在 PostgreSQL 中创建分区序列。示例:

在问题中使用正常序列:

| Project_ID | Issue |
| 1          | 1     |
| 1          | 2     |
| 2          | 3     |
| 2          | 4     |

在问题中使用分区序列:

| Project_ID | Issue |
| 1          | 1     |
| 1          | 2     |
| 2          | 1     |
| 2          | 2     |

Is there a simple (ie. non-hacky) and race-condition free way to create a partitioned sequence in PostgreSQL. Example:

Using a normal sequence in Issue:

| Project_ID | Issue |
| 1          | 1     |
| 1          | 2     |
| 2          | 3     |
| 2          | 4     |

Using a partitioned sequence in Issue:

| Project_ID | Issue |
| 1          | 1     |
| 1          | 2     |
| 2          | 1     |
| 2          | 2     |

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

裂开嘴轻声笑有多痛 2024-09-23 20:16:37

我不相信有一种像常规序列一样简单的简单方法,因为:

  1. 序列仅存储一个数字流(下一个值等)。每个分区都需要一个。
  2. 序列具有绕过当前事务的特殊处理(以避免竞争条件)。如果不使用 dblink 等技巧,很难在 SQL 或 PL/pgSQL 级别复制此内容。
  3. DEFAULT 列属性可以使用简单的表达式或函数调用,例如 nextval('myseq');但它无法引用其他列来通知函数该值应来自哪个流。

你可以做出一些有用的东西,但你可能不会认为它很简单。依次解决上述问题:

  1. 使用表来存储所有分区的下一个值,其模式类似于multiseq(partition_id, next_val)
  2. 编写一个 multinextval(seq_table,partition_id) 函数,执行如下操作:

    1. 创建一个独立于当前事务的新事务(一种方法是通过 dblink;我相信其他一些服务器语言可以更轻松地做到这一点)。
    2. 锁定seq_table中提到的表。
    3. 使用递增的值更新分区 ID 为 partition_id 的行。 (或者如果没有现有行,则插入值为 2 的新行。)
    4. 提交该事务并返回之前存储的 ID(或 1)。
  3. 在您的项目表上创建一个插入触发器,使用调用 multinextval('projects_table', NEW.Project_ID) 进行插入。

我自己没有使用整个计划,但我单独尝试了与每个步骤类似的方法。如果您想尝试此操作,可以提供 multinextval 函数和触发器的示例...

I do not believe there is a simple way that is as easy as regular sequences, because:

  1. A sequence stores only one number stream (next value, etc.). You want one for each partition.
  2. Sequences have special handling that bypasses the current transaction (to avoid the race condition). It is hard to replicate this at the SQL or PL/pgSQL level without using tricks like dblink.
  3. The DEFAULT column property can use a simple expression or a function call like nextval('myseq'); but it cannot refer to other columns to inform the function which stream the value should come from.

You can make something that works, but you probably won't think it simple. Addressing the above problems in turn:

  1. Use a table to store the next value for all partitions, with a schema like multiseq (partition_id, next_val).
  2. Write a multinextval(seq_table, partition_id) function that does something like the following:

    1. Create a new transaction independent on the current transaction (one way of doing this is through dblink; I believe some other server languages can do it more easily).
    2. Lock the table mentioned in seq_table.
    3. Update the row where the partition id is partition_id, with an incremented value. (Or insert a new row with value 2 if there is no existing one.)
    4. Commit that transaction and return the previous stored id (or 1).
  3. Create an insert trigger on your projects table that uses a call to multinextval('projects_table', NEW.Project_ID) for insertions.

I have not used this entire plan myself, but I have tried something similar to each step individually. Examples of the multinextval function and the trigger can be provided if you want to attempt this...

听风念你 2024-09-23 20:16:37

我有同样的需求,我的解决方案几乎就是@Edmund 上面所说的。

到目前为止我发现的唯一缺点是,由于我的查询查找序列列的 MAX() 值,理论上,如果删除最新行 (= by这样做可以减少最大记录序列号)并添加一个新序列号。

在未来的迭代中,我可能会创建一个专门用于存储每个单独序列的最大值的表,但我想对其在重负载下如何工作进行一些性能测试。我有效地替换了原生序列,而没有对其进行所有优化,这肯定会带来缺点。


就我而言,我在具有不同分区列的许多列上使用序列号,因此我创建了一个从触发器调用的通用函数。

这是函数。您需要 hstore 扩展来运行它。

CREATE OR REPLACE FUNCTION nssequence_nextval()
RETURNS trigger AS $
DECLARE
    query text;
    nextval bigint;
BEGIN
    IF TG_NARGS != 2 THEN
        RAISE EXCEPTION '% did not supply nssequence_nextval with the required arguments.', TG_NAME;
    END IF;

    query := 'SELECT COALESCE(MAX(%I)::text, ''0'')::bigint + 1 FROM %I.%I WHERE organisation_id = $1.organisation_id AND %I = $1.%I'::text; -- In my context, every row has an organisation_id as my app is multi-tenant-capable. Adjust this query to your needs.
    EXECUTE format(
        query,
        TG_ARGV[0], -- MAX(%I)::text, the column holding the sequence
        TG_TABLE_SCHEMA, TG_TABLE_NAME, -- FROM %I.%I - the table we are working with
        TG_ARGV[1], TG_ARGV[1] -- %I = $1.%I -- the column with the sequence namespace. Ideally a parent entity's UUID.
    ) USING NEW INTO nextval;

    -- The property name is the first argument of the trigger function, so we need hstore to set it.
    -- See https://stackoverflow.com/questions/7711432/how-to-set-value-of-composite-variable-field-using-dynamic-sql/7782641#7782641
    NEW := NEW #= hstore(TG_ARGV[0]::text, nextval::text);

    RETURN NEW;
END
$ LANGUAGE 'plpgsql';

这是每列的触发器(用实际名称替换方括号中的内容):

CREATE OR REPLACE TRIGGER [nssequence_trigger_name]
BEFORE INSERT ON [schema.table]
FOR EACH ROW
EXECUTE PROCEDURE nssequence_nextval('[sequence_column_name]', '[partition_column_name]')

顺便说一句,我已将其自动化。我对每个序列列使用列注释,例如 nssequence:partition_column_name。我还编写了一个脚本,用于查找具有此类评论的所有列并向它们添加触发器。

这是查找所有序列列的查询:

SELECT
    DISTINCT pgd.objoid, -- only used for filtering out duplicates
    c.table_schema,
    c.table_name,
    c.column_name,
    pgd.description
FROM pg_catalog.pg_statio_all_tables as st
INNER JOIN pg_catalog.pg_description pgd ON (
    pgd.objoid = st.relid
)
INNER JOIN information_schema.columns c ON (
    pgd.objsubid   = c.ordinal_position
    AND c.table_schema = st.schemaname
    AND c.table_name   = st.relname
)
INNER JOIN information_schema.tables t ON (
    t.table_type = 'BASE TABLE'
)
WHERE
    t.table_schema != 'information_schema'
    AND t.table_schema NOT LIKE 'pg_%'
    AND pgd.description LIKE 'nssequence%'

最后,我还有一个清理机制,用于查找在不再存在或不再说它们是序列号的列上创建的触发器,但这可能对此来说太详细了回答。

I had the same need and my solution was pretty much what @Edmund said above.

The only downside I found so far is that since my query looks for the MAX() value of the sequence column, in theory, the number could be re-used if you delete the latest row (= by doing that you decrement the max recorded sequence number) and add a new one.

In some future iteration I will probably create a table dedicated to storing max values of each separate sequence, but I want to do some performance testing on how would that work under a heavy load. I'm effectively replacing native sequences without all their optimisations, which must come with downsides.


In my case, I am using the sequence numbers over a lot of columns with different partition columns, so I created a universal function I call from my triggers.

Here is the function. You'll need hstore extension to run it.

CREATE OR REPLACE FUNCTION nssequence_nextval()
RETURNS trigger AS $
DECLARE
    query text;
    nextval bigint;
BEGIN
    IF TG_NARGS != 2 THEN
        RAISE EXCEPTION '% did not supply nssequence_nextval with the required arguments.', TG_NAME;
    END IF;

    query := 'SELECT COALESCE(MAX(%I)::text, ''0'')::bigint + 1 FROM %I.%I WHERE organisation_id = $1.organisation_id AND %I = $1.%I'::text; -- In my context, every row has an organisation_id as my app is multi-tenant-capable. Adjust this query to your needs.
    EXECUTE format(
        query,
        TG_ARGV[0], -- MAX(%I)::text, the column holding the sequence
        TG_TABLE_SCHEMA, TG_TABLE_NAME, -- FROM %I.%I - the table we are working with
        TG_ARGV[1], TG_ARGV[1] -- %I = $1.%I -- the column with the sequence namespace. Ideally a parent entity's UUID.
    ) USING NEW INTO nextval;

    -- The property name is the first argument of the trigger function, so we need hstore to set it.
    -- See https://stackoverflow.com/questions/7711432/how-to-set-value-of-composite-variable-field-using-dynamic-sql/7782641#7782641
    NEW := NEW #= hstore(TG_ARGV[0]::text, nextval::text);

    RETURN NEW;
END
$ LANGUAGE 'plpgsql';

and here is the trigger for each column (replace things in square brackets with actual names):

CREATE OR REPLACE TRIGGER [nssequence_trigger_name]
BEFORE INSERT ON [schema.table]
FOR EACH ROW
EXECUTE PROCEDURE nssequence_nextval('[sequence_column_name]', '[partition_column_name]')

Just by the way, I have automated this. I use a column comment on each sequence column, such as nssequence:partition_column_name. I also wrote a script that looks for all columns that have the such comment and adds the triggers to them.

This is the query to find all sequence columns:

SELECT
    DISTINCT pgd.objoid, -- only used for filtering out duplicates
    c.table_schema,
    c.table_name,
    c.column_name,
    pgd.description
FROM pg_catalog.pg_statio_all_tables as st
INNER JOIN pg_catalog.pg_description pgd ON (
    pgd.objoid = st.relid
)
INNER JOIN information_schema.columns c ON (
    pgd.objsubid   = c.ordinal_position
    AND c.table_schema = st.schemaname
    AND c.table_name   = st.relname
)
INNER JOIN information_schema.tables t ON (
    t.table_type = 'BASE TABLE'
)
WHERE
    t.table_schema != 'information_schema'
    AND t.table_schema NOT LIKE 'pg_%'
    AND pgd.description LIKE 'nssequence%'

Finally I also have a cleanup mechanism in place that looks for triggers that are created over columns that no longer exist or no longer say they are a sequence number, but that's probably too much detail for this answer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文