SQL Server INSERT、Scope_Identity() 和物理写入光盘

发布于 2024-08-25 03:40:20 字数 1009 浏览 6 评论 0原文

我有一个存储过程,除其他外,它还可以在循环内的不同表中插入一些内容。请参阅下面的示例以更清楚地理解:

INSERT INTO T1 VALUES ('something')

SET @MyID = Scope_Identity()

... some stuff go here

INSERT INTO T2 VALUES (@MyID, 'something else')

... The rest of the procedure

这两个表(T1 和 T2)每个表中都有一个 IDENTITY(1, 1) 列,我们将它们称为 ID1 和 ID2;然而,在我们的生产数据库(非常繁忙的数据库)中运行该过程并且每个表中有超过 6250 条记录后,我注意到 ID1 与 ID2 不匹配的事件!尽管通常情况下,对于 T1 中插入的每条记录,T2 中都会插入一条记录,并且两者中的标识列都会一致递增。

“错误”记录是这样的:

ID1     Col1
----    ---------
4709    data-4709
4710    data-4710

ID2     ID1     Col1
----    ----    ---------
4709    4710    data-4710
4710    4709    data-4709

注意第二个表中的“倒置”ID1。

由于对 SQL Server 底层操作了解不多,我提出了以下“理论”,也许有人可以纠正我。

我认为,由于循环比物理写入表更快,和/或可能有其他原因延迟了写入过程,因此记录被缓冲。当需要写它们时,它们的书写顺序没有特定的顺序。

如果不可能的话,如何解释上述场景?

如果是的话,那么我还有另一个问题要提出。如果第一次插入(来自上面的代码)被延迟怎么办?这是否意味着我无法将正确的 IDENTITY 插入到第二个表中?如果答案也是肯定的,我该怎么做才能确保两个表中的插入将以正确的 IDENTITY 顺序发生?

我感谢任何有助于我理解这一点的评论和信息。

提前致谢。

I have a stored procedure that does, among other stuff, some inserts in different table inside a loop. See the example below for clearer understanding:

INSERT INTO T1 VALUES ('something')

SET @MyID = Scope_Identity()

... some stuff go here

INSERT INTO T2 VALUES (@MyID, 'something else')

... The rest of the procedure

These two tables (T1 and T2) have an IDENTITY(1, 1) column in each one of them, let's call them ID1 and ID2; however, after running the procedure in our production database (very busy database) and having more than 6250 records in each table, I have noticed one incident where ID1 does not match ID2! Although normally for each record inserted in T1, there is record inserted in T2 and the identity column in both is incremented consistently.

The "wrong" records were something like that:

ID1     Col1
----    ---------
4709    data-4709
4710    data-4710

ID2     ID1     Col1
----    ----    ---------
4709    4710    data-4710
4710    4709    data-4709

Note the "inverted", ID1 in the second table.

Knowing not that much about SQL Server underneath operations, I have put the following "theory", maybe someone can correct me on this.

What I think is that because the loop is faster than physically writing to the table, and/or maybe some other thing delayed the writing process, the records were buffered. When it comes the time to write them, they were wrote in no particular order.

Is that even possible if no, how to explain the above mentioned scenario?

If yes, then I have another question to rise. What if the first insert (from the code above) got delayed? Doesn't that mean I won't get the correct IDENTITY to insert into the second table? If the answer of this is also yes, what can I do to insure the insertion in the two tables will happen in sequence with the correct IDENTITY?

I appreciate any comment and information that help me understand this.

Thanks in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

九八野马 2024-09-01 03:40:20

您无法依靠 IDENTITY 来解决第二个表的问题。如果您关心该行生成的主键值,则应该自行生成。

IDENTITY 是一种说法,“我不想自己生成密钥,只需为我生成密钥,并且在需要时我会询问生成的值”。

这里可能发生的情况是两个线程同时插入行,但它们都尚未提交,因此您会遇到这种情况:

Thread 1                      Thread 2
get id for table 1 = 4709
                              get id for table 1 = 4710
insert row for table 1
                              insert row for table 1
                              get id for table 2 = 4709
get id for table 2 = 4710
                              insert row for table 2
insert row for table 1

您有两种方法来解决您的问题:

  1. 删除第二个表中主键的 IDENTITY
  2. 使用 SET IDENTITY_INSERT ON 允许您为其提供密钥,同时保留 IDENTITY 设置。

但是,在这种情况下,我将使用方法 nbr。 1. 方法编号。 2 通常在将数据导入到空表时使用。您不希望数据库自动生成您稍后想要使用的 ID 的风险(因为它来自第一个表),因此您应该禁用第二个表的主键上的 IDENTITY 设置。

或者您可以尝试完全避免依赖该表的键,因为您有外键引用,您真的需要键值相同吗?

There is no way you can rely on IDENTITY to solve this for your second table. If you care about the generated primary key value for that row, you should generate itself.

IDENTITY is a way of saying "I don't want the hassle of generating a key myself, just do it for me, and I'll ask for the generated value if and when I need it".

What could be happening here is that two threads are inserting the rows at the same time, none of them have committed yet, so you get this scenario:

Thread 1                      Thread 2
get id for table 1 = 4709
                              get id for table 1 = 4710
insert row for table 1
                              insert row for table 1
                              get id for table 2 = 4709
get id for table 2 = 4710
                              insert row for table 2
insert row for table 1

You have two ways to solve your problem:

  1. Remove IDENTITY for the primary key in the second table
  2. Use SET IDENTITY_INSERT ON to allow you to provide a key for it, while keeping the IDENTITY setting

In this case, however, I would use method nbr. 1. Method nbr. 2 is usually used when importing data into an empty table. You don't want the risk of the database auto-generating an ID you later on want to use yourself (since it comes from the first table), and so you should disable IDENTITY setting on the primary key of the second table.

Or you could try to avoid relying on the key for that table at all, since you have a foreign key reference, do you really need the key values to be the same?

栖竹 2024-09-01 03:40:20

当然,你的上述情况是可能的——而且可能性也很大。

如果您有两个单独的、独立的表,都用于查询和插入,并且都具有单独的 IDENTITY(1,1) 字段,则绝对不能保证插入到一个表中,然后插入到另一个表中第二个将以相同的顺序执行!

如果确实需要在两者之间建立链接,请将第一个表的 ID 作为外键插入到第二个表中。您不能依赖 IDENTITY 生成的 ID 在两个表中相同!

Of course your above scenario is possible - and quite likely, too.

If you have two separate, independent tables, both being used for queries and inserts, both with a separate IDENTITY(1,1) field, there's absolutely no guarantee that an insert into one table and then into the second will be executed in the same order!

If you do need to establish a link between the two, insert the first table's ID into the second table as a foreign key. You cannot rely on the ID's generated from IDENTITY's to be the same in both tables!

云之铃。 2024-09-01 03:40:20

关于写入:

  • 每当您执行更改数据的操作时,都会立即将其写入数据库日志,并且在发生此操作之前您不会得到事务确认。这就是 ACID 条件中的 D(数据库理论)。
  • 脏数据库页“在后台”写入磁盘。如果脏的太多,则会触发检查点并将它们全部转储出去。

到此为止,到了写作部分。

您可能会遇到一个简单的事实:虽然各个语句是原子的,但繁忙的数据库可能有多个线程在运行。因此,基本上,语句之间发生了线程切换。一个线程获得 Id1,另一个线程获得 id1、id2,然后第一个线程获得 id2。

这里没有什么具体的;)当多个线程同时运行时典型的正常数据库行为。与写作本身无关。

基本上,之间
设置@MyID = Scope_Identity()
下一条语句,另一个线程可以获得优先级;)

Regading writing:

  • Whenever you do something that changes data, this is written to the database LOGS that moment, and you dont get a transaction confirm until this has happened. That is the D in ACID conditions (database theory).
  • Dirty database pages are written to disk "in the background". If too many are dirty, a checkpoint is triggered and they are all dumped out.

So far to the writing part.

Waht you probably run into is simlpy the fact that while individual statements are atomic, a busy atabase has possibly more than one thread running along it. So, basically, a thread switch happened between the statements. One thread got Id1, another one prioerity, id1, id2, then the first one id2.

Nothing specific here ;) Typical normal database behavior when multiple threads run along. Nothing to do with writing per se.

Basically, between
SET @MyID = Scope_Identity()
and the next statement, another thread can get priority ;)

埋情葬爱 2024-09-01 03:40:20

不要依赖业务/应用程序逻辑的标识列的实际值您只能假设它们是唯一的!

do not rely on the actual values of identity columns for business/application logic you can only assume that they will be unique!

躲猫猫 2024-09-01 03:40:20

您应该能够通过使用 SQL 2005 功能(OUTPUT 子句)来避免此问题。链接如下。

http://msdn.microsoft.com/en-us/library/ms177564。 ASPX

You should be able to avoid this issue by using a SQL 2005 feature, the OUTPUT clause. Link below.

http://msdn.microsoft.com/en-us/library/ms177564.aspx

云醉月微眠 2024-09-01 03:40:20

这是 SQL Server 中的一个已知错误。

问题是,当它生成查询计划时,并行化会导致作用域标识不正确。

将该部分移至其自己的过程中,因此传入参数并返回范围标识 - 现在它应该是正确的。

如果我没记错的话,这只出现在具有大约一百万行或更多行的表上。

啊哈,这是知识库: http://support.microsoft.com/default.aspx?scid=kb;en-us;2019779&sd=rss&spid=2855

This is a known bug in SQL Server.

The problem is when it generates the query plan the parallelisation causes scope identity to be incorrect.

Move that part into its own procedure, so pass in the params and return the scope identity - Now it should be correct.

If I remember rightly, this only manifests on tables with around a million rows or more.

Aha, here's the KB: http://support.microsoft.com/default.aspx?scid=kb;en-us;2019779&sd=rss&spid=2855

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文