SQL Server 2005 中的死锁!两个实时批量更新插入正在发生冲突。为什么?
场景如下:
我有一个名为 MarketDataCurrent (MDC) 的表,该表实时更新库存价格。
我有一个名为“LiveFeed”的进程,它读取从线路传输的价格流,对插入进行排队,并使用“批量上传到临时表,然后插入/更新到 MDC 表”。 (BulkUpsert)
我有另一个进程,然后使用类似的 BulkUpsert 存储过程读取此数据,计算其他数据,然后将结果保存回同一个表中。
第三,有大量用户运行 C# Gui 轮询 MDC 表并从中读取更新。
现在,在数据快速变化的白天,事情运行得相当顺利,但是,在市场交易时间之后,我们最近开始看到数据库中出现越来越多的死锁异常,现在每天会看到 10-20 个。这里要注意的重要一点是,这些发生在值没有改变的情况下。
以下是所有相关信息:
Table Def:
CREATE TABLE [dbo].[MarketDataCurrent](
[MDID] [int] NOT NULL,
[LastUpdate] [datetime] NOT NULL,
[Value] [float] NOT NULL,
[Source] [varchar](20) NULL,
CONSTRAINT [PK_MarketDataCurrent] PRIMARY KEY CLUSTERED
(
[MDID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
-
我正在运行 Sql Profiler 跟踪,捕获死锁,所有图表如下所示。
进程 258 被重复调用以下“BulkUpsert”存储过程,而 73 正在调用下一个一:
ALTER proc [dbo].[MarketDataCurrent_BulkUpload]
@updateTime datetime,
@source varchar(10)
as
begin transaction
update c with (rowlock) set LastUpdate = getdate(), Value = t.Value, Source = @source
from MarketDataCurrent c INNER JOIN #MDTUP t ON c.MDID = t.mdid
where c.lastUpdate < @updateTime
and c.mdid not in (select mdid from MarketData where LiveFeedTicker is not null and PriceSource like 'LiveFeed.%')
and c.value <> t.value
insert into MarketDataCurrent
with (rowlock)
select MDID, getdate(), Value, @source from #MDTUP
where mdid not in (select mdid from MarketDataCurrent with (nolock))
and mdid not in (select mdid from MarketData where LiveFeedTicker is not null and PriceSource like 'LiveFeed.%')
commit
另一个:
ALTER PROCEDURE [dbo].[MarketDataCurrent_LiveFeedUpload]
AS
begin transaction
-- Update existing mdid
UPDATE c WITH (ROWLOCK) SET LastUpdate = t.LastUpdate, Value = t.Value, Source = t.Source
FROM MarketDataCurrent c INNER JOIN #TEMPTABLE2 t ON c.MDID = t.mdid;
-- Insert new MDID
INSERT INTO MarketDataCurrent with (ROWLOCK) SELECT * FROM #TEMPTABLE2
WHERE MDID NOT IN (SELECT MDID FROM MarketDataCurrent with (NOLOCK))
-- Clean up the temp table
DELETE #TEMPTABLE2
commit
澄清一下,这些临时表是由同一连接上的 C# 代码创建的,并使用 C# SqlBulkCopy 类进行填充。
对我来说,它看起来像是在表的 PK 上死锁,因此我尝试删除该 PK 并切换到唯一约束,但这使死锁数量增加了 10 倍。
我完全不知道该如何应对这种情况,并且愿意接受任何建议。
帮助!!
为了响应 XDL 请求,它是:
<deadlock-list>
<deadlock victim="processc19978">
<process-list>
<process id="processaf0b68" taskpriority="0" logused="0" waitresource="KEY: 6:72057594090487808 (d900ed5a6cc6)" waittime="718" ownerId="1102128174" transactionname="user_transaction" lasttranstarted="2010-06-11T16:30:44.750" XDES="0xffffffff817f9a40" lockMode="U" schedulerid="3" kpid="8228" status="suspended" spid="73" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2010-06-11T16:30:44.750" lastbatchcompleted="2010-06-11T16:30:44.750" clientapp=".Net SqlClient Data Provider" hostname="RISKAPPS_VM" hostpid="3836" loginname="RiskOpt" isolationlevel="read committed (2)" xactid="1102128174" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
<executionStack>
<frame procname="MKP_RISKDB.dbo.MarketDataCurrent_BulkUpload" line="28" stmtstart="1062" stmtend="1720" sqlhandle="0x03000600a28e5e4ef4fd8e00849d00000100000000000000">
UPDATE c WITH (ROWLOCK) SET LastUpdate = getdate(), Value = t.Value, Source = @source
FROM MarketDataCurrent c INNER JOIN #MDTUP t ON c.MDID = t.mdid
WHERE c.lastUpdate < @updateTime
and c.mdid not in (select mdid from MarketData where BloombergTicker is not null and PriceSource like 'Blbg.%')
and c.value <> t.value </frame>
<frame procname="adhoc" line="1" stmtstart="88" sqlhandle="0x01000600c1653d0598706ca7000000000000000000000000">
exec MarketDataCurrent_BulkUpload @clearBefore, @source </frame>
<frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">
unknown </frame>
</executionStack>
<inputbuf>
(@clearBefore datetime,@source nvarchar(10))exec MarketDataCurrent_BulkUpload @clearBefore, @source </inputbuf>
</process>
<process id="processc19978" taskpriority="0" logused="0" waitresource="KEY: 6:72057594090487808 (74008e31572b)" waittime="718" ownerId="1102128228" transactionname="user_transaction" lasttranstarted="2010-06-11T16:30:44.780" XDES="0x380be9d8" lockMode="U" schedulerid="5" kpid="8464" status="suspended" spid="248" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2010-06-11T16:30:44.780" lastbatchcompleted="2010-06-11T16:30:44.780" clientapp=".Net SqlClient Data Provider" hostname="RISKBBG_VM" hostpid="4480" loginname="RiskOpt" isolationlevel="read committed (2)" xactid="1102128228" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
<executionStack>
<frame procname="MKP_RISKDB.dbo.MarketDataCurrentBlbgRtUpload" line="14" stmtstart="840" stmtend="1220" sqlhandle="0x03000600005f9d24c8878f00849d00000100000000000000">
UPDATE c WITH (ROWLOCK) SET LastUpdate = t.LastUpdate, Value = t.Value, Source = t.Source
FROM MarketDataCurrent c INNER JOIN #TEMPTABLE2 t ON c.MDID = t.mdid;
-- Insert new MDID </frame>
<frame procname="adhoc" line="1" sqlhandle="0x010006004a58132228bf8d73000000000000000000000000">
MarketDataCurrentBlbgRtUpload </frame>
</executionStack>
<inputbuf>
MarketDataCurrentBlbgRtUpload </inputbuf>
</process>
</process-list>
<resource-list>
<keylock hobtid="72057594090487808" dbid="6" objectname="MKP_RISKDB.dbo.MarketDataCurrent" indexname="PK_MarketDataCurrent" id="lock5ba77b00" mode="U" associatedObjectId="72057594090487808">
<owner-list>
<owner id="processc19978" mode="U"/>
</owner-list>
<waiter-list>
<waiter id="processaf0b68" mode="U" requestType="wait"/>
</waiter-list>
</keylock>
<keylock hobtid="72057594090487808" dbid="6" objectname="MKP_RISKDB.dbo.MarketDataCurrent" indexname="PK_MarketDataCurrent" id="lock65dca340" mode="U" associatedObjectId="72057594090487808">
<owner-list>
<owner id="processaf0b68" mode="U"/>
</owner-list>
<waiter-list>
<waiter id="processc19978" mode="U" requestType="wait"/>
</waiter-list>
</keylock>
</resource-list>
</deadlock>
</deadlock-list>
Here's the scenario:
I've got a table called MarketDataCurrent (MDC) that has live updating stock prices.
I've got one process called 'LiveFeed' which reads prices streaming from the wire, queues up inserts, and uses a 'bulk upload to temp table then insert/update to MDC table.' (BulkUpsert)
I've got another process which then reads this data, computes other data, and then saves the results back into the same table, using a similar BulkUpsert stored proc.
Thirdly, there are a multitude of users running a C# Gui polling the MDC table and reading updates from it.
Now, during the day when the data is changing rapidly, things run pretty smoothly, but then, after market hours, we've recently started seeing an increasing number of Deadlock exceptions coming out of the database, nowadays we see 10-20 a day. The imporant thing to note here is that these happen when the values are NOT changing.
Here's all the relevant info:
Table Def:
CREATE TABLE [dbo].[MarketDataCurrent](
[MDID] [int] NOT NULL,
[LastUpdate] [datetime] NOT NULL,
[Value] [float] NOT NULL,
[Source] [varchar](20) NULL,
CONSTRAINT [PK_MarketDataCurrent] PRIMARY KEY CLUSTERED
(
[MDID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
-
I've got a Sql Profiler Trace Running, catching the deadlocks, and here's what all the graphs look like.
Process 258 is called the following 'BulkUpsert' stored proc, repeatedly, while 73 is calling the next one:
ALTER proc [dbo].[MarketDataCurrent_BulkUpload]
@updateTime datetime,
@source varchar(10)
as
begin transaction
update c with (rowlock) set LastUpdate = getdate(), Value = t.Value, Source = @source
from MarketDataCurrent c INNER JOIN #MDTUP t ON c.MDID = t.mdid
where c.lastUpdate < @updateTime
and c.mdid not in (select mdid from MarketData where LiveFeedTicker is not null and PriceSource like 'LiveFeed.%')
and c.value <> t.value
insert into MarketDataCurrent
with (rowlock)
select MDID, getdate(), Value, @source from #MDTUP
where mdid not in (select mdid from MarketDataCurrent with (nolock))
and mdid not in (select mdid from MarketData where LiveFeedTicker is not null and PriceSource like 'LiveFeed.%')
commit
And the other one:
ALTER PROCEDURE [dbo].[MarketDataCurrent_LiveFeedUpload]
AS
begin transaction
-- Update existing mdid
UPDATE c WITH (ROWLOCK) SET LastUpdate = t.LastUpdate, Value = t.Value, Source = t.Source
FROM MarketDataCurrent c INNER JOIN #TEMPTABLE2 t ON c.MDID = t.mdid;
-- Insert new MDID
INSERT INTO MarketDataCurrent with (ROWLOCK) SELECT * FROM #TEMPTABLE2
WHERE MDID NOT IN (SELECT MDID FROM MarketDataCurrent with (NOLOCK))
-- Clean up the temp table
DELETE #TEMPTABLE2
commit
To clarify, those Temp Tables are being created by the C# code on the same connection and are populated using the C# SqlBulkCopy class.
To me it looks like it's deadlocking on the PK of the table, so I tried removing that PK and switching to a Unique Constraint instead but that increased the number of deadlocks 10-fold.
I'm totally lost as to what to do about this situation and am open to just about any suggestion.
HELP!!
In response to the request for the XDL, here it is:
<deadlock-list>
<deadlock victim="processc19978">
<process-list>
<process id="processaf0b68" taskpriority="0" logused="0" waitresource="KEY: 6:72057594090487808 (d900ed5a6cc6)" waittime="718" ownerId="1102128174" transactionname="user_transaction" lasttranstarted="2010-06-11T16:30:44.750" XDES="0xffffffff817f9a40" lockMode="U" schedulerid="3" kpid="8228" status="suspended" spid="73" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2010-06-11T16:30:44.750" lastbatchcompleted="2010-06-11T16:30:44.750" clientapp=".Net SqlClient Data Provider" hostname="RISKAPPS_VM" hostpid="3836" loginname="RiskOpt" isolationlevel="read committed (2)" xactid="1102128174" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
<executionStack>
<frame procname="MKP_RISKDB.dbo.MarketDataCurrent_BulkUpload" line="28" stmtstart="1062" stmtend="1720" sqlhandle="0x03000600a28e5e4ef4fd8e00849d00000100000000000000">
UPDATE c WITH (ROWLOCK) SET LastUpdate = getdate(), Value = t.Value, Source = @source
FROM MarketDataCurrent c INNER JOIN #MDTUP t ON c.MDID = t.mdid
WHERE c.lastUpdate < @updateTime
and c.mdid not in (select mdid from MarketData where BloombergTicker is not null and PriceSource like 'Blbg.%')
and c.value <> t.value </frame>
<frame procname="adhoc" line="1" stmtstart="88" sqlhandle="0x01000600c1653d0598706ca7000000000000000000000000">
exec MarketDataCurrent_BulkUpload @clearBefore, @source </frame>
<frame procname="unknown" line="1" sqlhandle="0x000000000000000000000000000000000000000000000000">
unknown </frame>
</executionStack>
<inputbuf>
(@clearBefore datetime,@source nvarchar(10))exec MarketDataCurrent_BulkUpload @clearBefore, @source </inputbuf>
</process>
<process id="processc19978" taskpriority="0" logused="0" waitresource="KEY: 6:72057594090487808 (74008e31572b)" waittime="718" ownerId="1102128228" transactionname="user_transaction" lasttranstarted="2010-06-11T16:30:44.780" XDES="0x380be9d8" lockMode="U" schedulerid="5" kpid="8464" status="suspended" spid="248" sbid="0" ecid="0" priority="0" transcount="2" lastbatchstarted="2010-06-11T16:30:44.780" lastbatchcompleted="2010-06-11T16:30:44.780" clientapp=".Net SqlClient Data Provider" hostname="RISKBBG_VM" hostpid="4480" loginname="RiskOpt" isolationlevel="read committed (2)" xactid="1102128228" currentdb="6" lockTimeout="4294967295" clientoption1="671088672" clientoption2="128056">
<executionStack>
<frame procname="MKP_RISKDB.dbo.MarketDataCurrentBlbgRtUpload" line="14" stmtstart="840" stmtend="1220" sqlhandle="0x03000600005f9d24c8878f00849d00000100000000000000">
UPDATE c WITH (ROWLOCK) SET LastUpdate = t.LastUpdate, Value = t.Value, Source = t.Source
FROM MarketDataCurrent c INNER JOIN #TEMPTABLE2 t ON c.MDID = t.mdid;
-- Insert new MDID </frame>
<frame procname="adhoc" line="1" sqlhandle="0x010006004a58132228bf8d73000000000000000000000000">
MarketDataCurrentBlbgRtUpload </frame>
</executionStack>
<inputbuf>
MarketDataCurrentBlbgRtUpload </inputbuf>
</process>
</process-list>
<resource-list>
<keylock hobtid="72057594090487808" dbid="6" objectname="MKP_RISKDB.dbo.MarketDataCurrent" indexname="PK_MarketDataCurrent" id="lock5ba77b00" mode="U" associatedObjectId="72057594090487808">
<owner-list>
<owner id="processc19978" mode="U"/>
</owner-list>
<waiter-list>
<waiter id="processaf0b68" mode="U" requestType="wait"/>
</waiter-list>
</keylock>
<keylock hobtid="72057594090487808" dbid="6" objectname="MKP_RISKDB.dbo.MarketDataCurrent" indexname="PK_MarketDataCurrent" id="lock65dca340" mode="U" associatedObjectId="72057594090487808">
<owner-list>
<owner id="processaf0b68" mode="U"/>
</owner-list>
<waiter-list>
<waiter id="processc19978" mode="U" requestType="wait"/>
</waiter-list>
</keylock>
</resource-list>
</deadlock>
</deadlock-list>
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
死锁似乎是密钥访问顺序上的直接死锁。一种简单的解释是两次批量更新操作之间更新的密钥重叠。
一个不那么简单的解释是,在 SQL Server(以及其他服务器)中,锁定的密钥是散列的,并且存在(非常重要的)散列冲突概率。这可以解释为什么最近与以前相比您会看到更多的死锁:只是您的数据量增加了,因此冲突概率增加了。如果这看起来深奥且不可能,请继续阅读 %%lockres%% 碰撞概率魔法标记:16,777,215,以及由此链接的文章。概率高得惊人,对于完美密钥分布,仅在约 16M 插入后就有 50% 的冲突概率。对于正常的、现实世界的密钥分布,只需几千次插入就有很大的冲突概率。不幸的是,没有解决办法。如果这确实是问题所在,那么您唯一的解决方案是减少批次的大小(#temp 表的大小),以便减少冲突概率。或者处理死锁并重试...无论如何您都必须这样做,但至少您可以处理更少死锁。
The deadlock seems to be straight forward deadlocks on key access order. One trivial explanation is overlap of the updated keys between the two bulk update operations.
A less trivial explanation though is that in SQL Server (and other servers too) the locked keys are hashed, and there is a (pretty significant) hash collision probability. This would explain why you see more deadlocks lately compared with before: simply your volume of data has increased and therefore the collision probability has increased. If this seems esoteric and improbable, just read on at %%lockres%% collision probability magic marker: 16,777,215, and the article linked from that. The probability is surprisingly high, for a perfect key distribution you have 50% collision probability after only ~16M inserts. For normal, real world, key distributions you have significant collision probability at only few thousand inserts. Unfortunately, there is no work around. Your only solution, if this is truly the problem, is to reduce the size of the batches (the size of the #temp tables) so that the collision probability is reduced. Or deal with deadlocks and retry... which you'll have to do anyway, but at least you can deal with fewer deadlocks.
它发生在主要工作时间之后,数据没有变化,而且是最近才开始的。服务器最近有什么变化吗?我怀疑一些新的数据库维护工作可能会造成干扰。
顺便说一句,如果您知道市场已休市并且数据没有变化,为什么您的流程仍在运行?
It's happening after main business hours, the data isn't changing and it just started recently. Did anything recently change on the server? I would suspect some new database maintenance job might be interfering.
BTW if you know the market is closed and the data isn't changing, why is your process still running?
我想回答我在评论中提出的一个问题,即
“如何识别锁定的行?”。
在下面的死锁 XDL 中,在锁定的两个“进程”节点上,有一个
waitresource
属性。在本例中:waitresource="KEY: 6:72057594090487808 (d4005c04b35f)
和
waitresource="KEY: 6:72057594090487808 (b00072ea4ffd)
使用
%%lockres% %
关键字 Remus 指出,这产生了冲突的两行。它们确实是唯一的id,并且不存在冲突。我仍然不知道为什么我会陷入僵局,但我已经越来越接近了。
我会注意到,两个 id 都应该只来自 LiveFeed 程序,但同样,更新中有一个子句应该从另一端的实际更新中过滤掉这一行。
I'd like to answer one question I asked in a comment, which is,
"How do you identify the rows which are locking?".
In the following deadlock XDL, on the two "process" nodes which are locking, there is a
waitresource
attribute. in this case:waitresource="KEY: 6:72057594090487808 (d4005c04b35f)
and
waitresource="KEY: 6:72057594090487808 (b00072ea4ffd)
Using the
%%lockres%%
keyword Remus pointed to,This yielded the two rows which are conflicting. they are indeed unique ids, and there is no collision. I still don't know why I'm getting a deadlock here, but I'm getting closer.
I will note that both of the id's are supposed to only be coming from the LiveFeed program, but again, there is a clause in the update that is supposed to be filtering out this row from actually updating from the other side.
经过近两年烦人的死锁警告电子邮件之后,我终于解决了这个问题。
我通过在竞争插入上使用 FULL TABLE LOCKING 解决了这个问题。我曾尝试将锁定减少到行级别,但锁定已升级到表级别。最后,我认为该表足够小,即使很多用户每秒都在读取和写入该表,但为了数据一致性,完全锁定对性能来说是一个小的影响。
此外,使用 MERGE 将插入/更新合并到一个原子语句中可以让我做到这一点。
这是已解决的生产代码(它有效!):
I have finally solved this problem, after nearly two years of annoying deadlock warning emails.
I resolved it by using FULL TABLE LOCKING on my competing inserts. I had tried reducing locking to row level, but the locks were escalating to table level. In the end, I decided the table was small enough that even though a lot of users are reading and writing to it every second, that a full lock was a small performance hit I was willing to take for data consistency.
Additionally, combining the insert/update to one atomic statement using MERGE allows me to do this.
Here's the resolved production code (it works!):