由事务隔离级别分隔的并发进程上的死锁

发布于 2024-12-01 00:23:22 字数 2205 浏览 1 评论 0原文

我们有一个工单表。服务器代理作业从游标中的表中获取 100 个条目并执行一些工作。为了并行化,有 10 个服务器代理作业,它们调用以下过程(每个都有自己的 @process_id):

CREATE PROCEDURE sp_do_workorder @process_id INT
AS
BEGIN TRY
    DECLARE @wo_id NCHAR(40),
        @wo_action NVARCHAR(100),
        @created_at DATETIME,
        @source_proc_name NVARCHAR(100),

    UPDATE procedure_ctrl SET [status]='running' WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='idle'

    WHILE 1=1
    BEGIN
        IF NOT EXISTS (SELECT * FROM procedure_ctrl WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='running') BREAK

        SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
        BEGIN TRANSACTION
            UPDATE workorder SET hash=CAST(@process_id AS NVARCHAR(100))
            FROM workorder x
            INNER JOIN (
                SELECT TOP 100 id FROM workorder WHERE hash='' AND workorder_step=0 ORDER BY created_at ASC
            ) y ON x.id=y.id
        COMMIT TRANSACTION
        SET TRANSACTION ISOLATION LEVEL READ COMMITTED

        DECLARE wo_cur CURSOR FAST_FORWARD FOR SELECT id,action,created_at,optin_source FROM workorder WHERE hash=CAST(@process_id AS NVARCHAR(100)) AND workorder_step=0 ORDER BY created_at ASC
        OPEN wo_cur
        FETCH NEXT FROM wo_cur INTO @wo_id,@wo_action,@created_at,@source_proc_name
        WHILE @@FETCH_STATUS=0
        BEGIN
            EXEC sp_basisprozess @wo_id,@wo_action,@created_at,@source_proc_name,@process_id
            FETCH NEXT FROM wo_cur INTO @wo_id,@wo_action,@created_at,@source_proc_name
        END
        CLOSE wo_cur
        DEALLOCATE wo_cur

        WAITFOR DELAY '00:00:01'
    END

    UPDATE procedure_ctrl SET [status]='idle' WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='running'
END TRY
BEGIN CATCH
    EXEC dbo.sp_listerror
    DECLARE @error NVARCHAR(4000)
    SET @error='[sp_do_workorder]_'+CAST(@process_id AS NVARCHAR(100))+': critical problem'
    RAISERROR(@error, 12, 1)    
END CATCH

对于这 10 个代理作业中的大多数,我们经常遇到死锁。有人提示为什么会这样吗?为了防止副作用,我们使用序列化事务隔离级别,因此只有一个代理作业可以获取一个工作订单条目。如果不设置事务隔离级别,死锁就会消失,但经常会发生两个(或更多)代理作业获取相同的工作订单条目。

We have a workorder table. A server agent jobs grabs 100 entries from this table in a cursor and do some work. To parallelize this there are 10 server agent jobs, which call the following procedure (each with its own @process_id):

CREATE PROCEDURE sp_do_workorder @process_id INT
AS
BEGIN TRY
    DECLARE @wo_id NCHAR(40),
        @wo_action NVARCHAR(100),
        @created_at DATETIME,
        @source_proc_name NVARCHAR(100),

    UPDATE procedure_ctrl SET [status]='running' WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='idle'

    WHILE 1=1
    BEGIN
        IF NOT EXISTS (SELECT * FROM procedure_ctrl WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='running') BREAK

        SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
        BEGIN TRANSACTION
            UPDATE workorder SET hash=CAST(@process_id AS NVARCHAR(100))
            FROM workorder x
            INNER JOIN (
                SELECT TOP 100 id FROM workorder WHERE hash='' AND workorder_step=0 ORDER BY created_at ASC
            ) y ON x.id=y.id
        COMMIT TRANSACTION
        SET TRANSACTION ISOLATION LEVEL READ COMMITTED

        DECLARE wo_cur CURSOR FAST_FORWARD FOR SELECT id,action,created_at,optin_source FROM workorder WHERE hash=CAST(@process_id AS NVARCHAR(100)) AND workorder_step=0 ORDER BY created_at ASC
        OPEN wo_cur
        FETCH NEXT FROM wo_cur INTO @wo_id,@wo_action,@created_at,@source_proc_name
        WHILE @@FETCH_STATUS=0
        BEGIN
            EXEC sp_basisprozess @wo_id,@wo_action,@created_at,@source_proc_name,@process_id
            FETCH NEXT FROM wo_cur INTO @wo_id,@wo_action,@created_at,@source_proc_name
        END
        CLOSE wo_cur
        DEALLOCATE wo_cur

        WAITFOR DELAY '00:00:01'
    END

    UPDATE procedure_ctrl SET [status]='idle' WHERE [procedure]='sp_do_workorder_'+CAST(@process_id AS NVARCHAR(100)) AND [status]='running'
END TRY
BEGIN CATCH
    EXEC dbo.sp_listerror
    DECLARE @error NVARCHAR(4000)
    SET @error='[sp_do_workorder]_'+CAST(@process_id AS NVARCHAR(100))+': critical problem'
    RAISERROR(@error, 12, 1)    
END CATCH

We get a deadlock really often for most of these 10 agent jobs. Has anyone a hint why this is so? To prevent side effects we use serialize transaction isolation level, so only one agent job can grab one workorder entry. Without setting transaction isolation leven the deadlock are gone, but then it often occurs, that two (or more) agent jobs grab the same workorder entry.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

jJeQQOZ5 2024-12-08 00:23:22

我还没有尝试具体弄清楚为什么在您的情况下会发生死锁,但似乎您正在有效地使用表作为队列,在这种情况下请参阅 此链接文章介绍了一种使用 OUTPUT 子句和锁定提示来最大化并发性而不会出现死锁的方法。

I haven't tried to work out specifically why the deadlock would occur in your case but it seems as though you are effectively using a table as a queue in which case see this linked article for an approach that uses the OUTPUT clause and locking hints to maximise concurrency without deadlocks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文