由事务隔离级别分隔的并发进程上的死锁
我们有一个工单表。服务器代理作业从游标中的表中获取 100 个条目并执行一些工作。为了并行化,有 10 个服务器代理作业,它们调用以下过程(每个都有自己的 @process_id
):
CREATE PROCEDURE sp_do_workorder @process_id INT
AS
BEGIN TRY
DECLARE @wo_id NCHAR(40),
@wo_action NVARCHAR(100),
@created_at DATETIME,
@source_proc_name NVARCHAR(100),
UPDATE procedure_ctrl SET [status]='running' WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='idle'
WHILE 1=1
BEGIN
IF NOT EXISTS (SELECT * FROM procedure_ctrl WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='running') BREAK
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
UPDATE workorder SET hash=CAST(@process_id AS NVARCHAR(100))
FROM workorder x
INNER JOIN (
SELECT TOP 100 id FROM workorder WHERE hash='' AND workorder_step=0 ORDER BY created_at ASC
) y ON x.id=y.id
COMMIT TRANSACTION
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
DECLARE wo_cur CURSOR FAST_FORWARD FOR SELECT id,action,created_at,optin_source FROM workorder WHERE hash=CAST(@process_id AS NVARCHAR(100)) AND workorder_step=0 ORDER BY created_at ASC
OPEN wo_cur
FETCH NEXT FROM wo_cur INTO @wo_id,@wo_action,@created_at,@source_proc_name
WHILE @@FETCH_STATUS=0
BEGIN
EXEC sp_basisprozess @wo_id,@wo_action,@created_at,@source_proc_name,@process_id
FETCH NEXT FROM wo_cur INTO @wo_id,@wo_action,@created_at,@source_proc_name
END
CLOSE wo_cur
DEALLOCATE wo_cur
WAITFOR DELAY '00:00:01'
END
UPDATE procedure_ctrl SET [status]='idle' WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='running'
END TRY
BEGIN CATCH
EXEC dbo.sp_listerror
DECLARE @error NVARCHAR(4000)
SET @error='[sp_do_workorder]_'+CAST(@process_id AS NVARCHAR(100))+': critical problem'
RAISERROR(@error, 12, 1)
END CATCH
对于这 10 个代理作业中的大多数,我们经常遇到死锁。有人提示为什么会这样吗?为了防止副作用,我们使用序列化事务隔离级别,因此只有一个代理作业可以获取一个工作订单条目。如果不设置事务隔离级别,死锁就会消失,但经常会发生两个(或更多)代理作业获取相同的工作订单条目。
We have a workorder table. A server agent jobs grabs 100 entries from this table in a cursor and do some work. To parallelize this there are 10 server agent jobs, which call the following procedure (each with its own @process_id
):
CREATE PROCEDURE sp_do_workorder @process_id INT
AS
BEGIN TRY
DECLARE @wo_id NCHAR(40),
@wo_action NVARCHAR(100),
@created_at DATETIME,
@source_proc_name NVARCHAR(100),
UPDATE procedure_ctrl SET [status]='running' WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='idle'
WHILE 1=1
BEGIN
IF NOT EXISTS (SELECT * FROM procedure_ctrl WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='running') BREAK
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRANSACTION
UPDATE workorder SET hash=CAST(@process_id AS NVARCHAR(100))
FROM workorder x
INNER JOIN (
SELECT TOP 100 id FROM workorder WHERE hash='' AND workorder_step=0 ORDER BY created_at ASC
) y ON x.id=y.id
COMMIT TRANSACTION
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
DECLARE wo_cur CURSOR FAST_FORWARD FOR SELECT id,action,created_at,optin_source FROM workorder WHERE hash=CAST(@process_id AS NVARCHAR(100)) AND workorder_step=0 ORDER BY created_at ASC
OPEN wo_cur
FETCH NEXT FROM wo_cur INTO @wo_id,@wo_action,@created_at,@source_proc_name
WHILE @@FETCH_STATUS=0
BEGIN
EXEC sp_basisprozess @wo_id,@wo_action,@created_at,@source_proc_name,@process_id
FETCH NEXT FROM wo_cur INTO @wo_id,@wo_action,@created_at,@source_proc_name
END
CLOSE wo_cur
DEALLOCATE wo_cur
WAITFOR DELAY '00:00:01'
END
UPDATE procedure_ctrl SET [status]='idle' WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='running'
END TRY
BEGIN CATCH
EXEC dbo.sp_listerror
DECLARE @error NVARCHAR(4000)
SET @error='[sp_do_workorder]_'+CAST(@process_id AS NVARCHAR(100))+': critical problem'
RAISERROR(@error, 12, 1)
END CATCH
We get a deadlock really often for most of these 10 agent jobs. Has anyone a hint why this is so? To prevent side effects we use serialize transaction isolation level, so only one agent job can grab one workorder entry. Without setting transaction isolation leven the deadlock are gone, but then it often occurs, that two (or more) agent jobs grab the same workorder entry.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我还没有尝试具体弄清楚为什么在您的情况下会发生死锁,但似乎您正在有效地使用表作为队列,在这种情况下请参阅 此链接文章介绍了一种使用
OUTPUT
子句和锁定提示来最大化并发性而不会出现死锁的方法。I haven't tried to work out specifically why the deadlock would occur in your case but it seems as though you are effectively using a table as a queue in which case see this linked article for an approach that uses the
OUTPUT
clause and locking hints to maximise concurrency without deadlocks.