从事务回滚中排除表

发布于 2024-11-06 10:48:36 字数 633 浏览 4 评论 0原文

我们有一个表和一组用于生成 pk id 的过程。该表保存最后一个 id,过程获取该 id,将其递增,更新表,然后返回新递增的 id。

此过程可能位于事务内。问题是,如果我们进行回滚,它可能会回滚到事务期间使用的任何 id 之前的 id(例如从不同的用户或线程生成的)。那么当id再次递增时,就会造成重复。

有没有办法从父事务中排除 id 生成表以防止这种情况发生?

要详细说明我们当前的问题...

首先,我们有一个准备将大量数据迁移到的系统。该系统由一个ms-sql(2008)数据库和一个textml数据库组成。 sql 数据库保存不到 3 天的数据,而 textml 则充当旧数据的存档。 textml 数据库还依赖于 sql 数据库来为特定字段提供 ids。这些字段目前是身份 PK,是在发布到 texml 数据库之前插入时生成的。我们不想通过sql清洗所有迁移的数据,因为这些记录会淹没当前系统,无论是流量还是数据。但同时我们无法生成这些 id,因为它们是 sql server 控制的自动递增值。

其次,我们有一个系统要求,需要我们能够从 texml 数据库中提取旧资产,并使用原始 ID 将其插入到 sql 数据库中。这样做是为了纠正和编辑目的,如果我们更改 ID,则会破坏我们无法控制的客户端系统下游的关系。当然,这一切都是一个问题,因为 id 列是标识列。

We have a table and a set of procedures that are used for generating pk ids. The table holds the last id, and the procedures gets the id, increments it, updates the table, and then returns the newly incremented id.

This procedure could potentially be within a transaction. The problem is that if the we have a rollback, it could potentially rollback to an id that is before any id's that came into use during the transaction (say generated from a different user or thread). Then when the id is incremented again, it will cause duplicates.

Is there any way to exclude the id generating table from a parent transaction to prevent this from happening?

To add detail our current problem...

First, we have a system we are preparing to migrate a lot of data into. The system consists of a ms-sql (2008) database, and a textml database. The sql database houses data less than 3 days old, while the textml acts as an archive for anything older. The textml db also relies on the sql db to provide ids' for particular fields. These fields are Identity PK's currently, and are generated on insertion before publishing to the texml db. We do not want to wash all our migrated data through sql since the records will flood the current system, both in terms of traffic and data. But at the same time we have no way of generating these id's since they are auto-incremented values that sql server controls.

Secondly, we have a system requirement which needs us to be able to pull an old asset out of the texml database and insert it back into the sql database with the original id's. This is done for correction and editing purposes, and if we alter the id's it will break relations downstream on clients system which we have no control over. Of course all this is an issue because id columns are Identity columns.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

与风相奔跑 2024-11-13 10:48:36

procedures 获取 id,并递增它,
更新表,然后返回
新增加的 id

这会导致死锁。过程必须在一个原子步骤中递增和返回,例如。通过使用 SQL Server 中的 OUTPUT 子句:

update ids
set id = id + 1
output inserted.id
where name= @name;

您不必担心并发性。事实上,以这种方式生成 id 意味着只有一个事务可以增加 id,因为更新将独占锁定该行。您无法获得重复项。您确实获得了所有操作的完整序列化(即没有性能和低吞吐量),但这是一个不同的问题。这就是为什么您应该使用内置机制来生成序列和身份。这些是特定于每个平台的:AUTO_INCRMENT在 MySQL 中,SEQUENCE在 Oracle 中,IDENTITYSEQUENCE SQL Server 中的 (仅在 Denali 中的序列)等等。

已更新
当我阅读您的编辑时,您想要控制生成的身份的唯一原因是能够插回存档记录。这已经是可能的,只需使用 IDENTITY_INSERT :

允许插入显式值
进入表的标识列

时将其重新插入旧记录,然后将其重新关闭:

SET IDENTITY_INSERT recordstable ON;
INSERT INTO recordstable (id, ...) values (@oldid, ...);
SET IDENTITY_INSERT recordstable OFF;

至于为什么手动生成的 ids 会序列化所有操作:任何生成 id 的事务都会独占锁定 ids 表中的行。在第一个事务提交或回滚之前,没有其他事务可以读取或写入该行。因此,任何时刻只能有一个事务在表上生成 id,即:序列化。

procedures gets the id, increments it,
updates the table, and then returns
the newly incremented id

This will cause deadlocks. procedure must increment and return in one single, atomic, step, eg. by using the OUTPUT clause in SQL Server:

update ids
set id = id + 1
output inserted.id
where name= @name;

You don't have to worry about concurrency. The fact that you generate ids this way implies that only one transaction can increment an id, because the update will lock the row exclusively. You cannot get duplicates. You do get complete serialization of all operations (ie. no performance and low throughput) but that is a different issue. And this why you should use built-in mechanisms for generating sequences and identities. These are specific to each platform: AUTO_INCREMENT in MySQL, SEQUENCE in Oracle, IDENTITY and SEQUENCE in SQL Server (sequence only in Denali) etc etc.

Updated
As I read your edit, the only reason why you want control of the generated identities is to be able to insert back archived records. This is already possible, simply use IDENTITY_INSERT:

Allows explicit values to be inserted
into the identity column of a table

Turn it on when you insert back the old record, then turn it back off:

SET IDENTITY_INSERT recordstable ON;
INSERT INTO recordstable (id, ...) values (@oldid, ...);
SET IDENTITY_INSERT recordstable OFF;

As for why manually generated ids serialize all operations: any transaction that generates an id will exclusively lock the row in the ids table. No other transaction can read or write that row until the first transaction commits or rolls back. Therefore there can be only one transaction generating an id on a table at any moment, ie. serialization.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文