EnterCriticalSection 死锁

发布于 2024-10-18 02:00:42 字数 4294 浏览 4 评论 0原文

多线程日志记录应用程序出现死锁情况。

小背景：

我的主应用程序有 4-6 个线程正在运行。主线程负责监视我正在做的各种事情的运行状况、更新 GUI 等......然后我有一个传输线程和一个接收线程。发送和接收线程与物理硬件通信。我有时需要调试发送和接收线程看到的数据；即打印到控制台，而不会由于数据的时间紧迫性而中断它们。顺便说一句，数据位于 USB 总线上。

由于应用程序的线程性质，我想创建一个调试控制台，可以从其他线程向其发送消息。调试控制台作为低优先级线程运行，并实现环形缓冲区，以便当您打印到调试控制台时，消息会快速存储到环形缓冲区并设置和事件。调试控制台的线程根据传入的绑定消息来处理 WaitingOnSingleObject 事件。检测到事件时，控制台线程会使用该消息更新 GUI 显示。简单吧？打印调用和控制台线程使用关键部分来控制访问。

注意：如果我发现我正在丢弃消息，我可以调整环形缓冲区大小（至少是这样）。

在测试应用程序中，如果我通过鼠标单击缓慢调用其 Print 方法，控制台工作得非常好。我有一个按钮，可以按下它来向控制台发送消息，并且它可以工作。但是，如果我施加任何类型的负载（多次调用 Print 方法），一切都会死锁。当我跟踪死锁时，我的 IDE 调试器会跟踪 EnterCriticalSection 并驻留在那里。

注意：如果我删除 Lock/UnLock 调用并仅使用 Enter/LeaveCriticalSection（请参阅代码），我有时会工作，但仍然发现自己处于死锁情况。为了排除堆栈推送/弹出的死锁，我现在直接调用 Enter/LeaveCriticalSection 但这并没有解决我的问题....这是怎么回事？

这是一个 Print 语句，它允许我将一个简单的 int 传递到显示控制台。

void TGDB::Print(int I)
{
    //Lock();
    EnterCriticalSection(&CS);

    if( !SuppressOutput )
    {
        //swprintf( MsgRec->Msg, L"%d", I);
        sprintf( MsgRec->Msg, "%d", I);
        MBuffer->PutMsg(MsgRec, 1);
    }

    SetEvent( m_hEvent );
    LeaveCriticalSection(&CS);
    //UnLock();
}

// My Lock/UnLock methods
void TGDB::Lock(void)
{
    EnterCriticalSection(&CS);
}

bool TGDB::TryLock(void)
{
    return( TryEnterCriticalSection(&CS) );
}

void TGDB::UnLock(void)
{
        LeaveCriticalSection(&CS);
}

// This is how I implemented Console's thread routines

DWORD WINAPI TGDB::ConsoleThread(PVOID pA)
{
DWORD rVal;

         TGDB *g = (TGDB *)pA;
        return( g->ProcessMessages() );
}

DWORD TGDB::ProcessMessages()
{
DWORD rVal;
bool brVal;
int MsgCnt;

    do
    {
        rVal = WaitForMultipleObjects(1, &m_hEvent, true, iWaitTime);

        switch(rVal)
        {
            case WAIT_OBJECT_0:

                EnterCriticalSection(&CS);
                //Lock();

                if( KeepRunning )
                {
                    Info->Caption = "Rx";
                    Info->Refresh();
                    MsgCnt = MBuffer->GetMsgCount();

                    for(int i=0; i<MsgCnt; i++)
                    {
                        MBuffer->GetMsg( MsgRec, 1);
                        Log->Lines->Add(MsgRec->Msg);
                    }
                }

                brVal = KeepRunning;
                ResetEvent( m_hEvent );
                LeaveCriticalSection(&CS);
                //UnLock();

            break;

            case WAIT_TIMEOUT:
                EnterCriticalSection(&CS);
                //Lock();
                Info->Caption = "Idle";
                Info->Refresh();
                brVal = KeepRunning;
                ResetEvent( m_hEvent );
                LeaveCriticalSection(&CS);
                //UnLock();
            break;

            case WAIT_FAILED:
                EnterCriticalSection(&CS);
                //Lock();
                brVal = false;
                Info->Caption = "ERROR";
                Info->Refresh();
                aLine.sprintf("Console error: [%d]", GetLastError() );
                Log->Lines->Add(aLine);
                aLine = "";
                LeaveCriticalSection(&CS);
                //UnLock();
            break;
        }

    }while( brVal );

    return( rVal );
}

MyTest1 和 MyTest2 只是我响应按钮按下而调用的两个测试函数。无论我单击按钮的速度有多快，MyTest1 都不会引起问题。 MyTest2 几乎每次都会死锁。

// No Dead Lock
void TTest::MyTest1()
{
    if(gdb)
    {
        // else where: gdb = new TGDB;
        gdb->Print(++I);
    }
}


// Causes a Dead Lock
void TTest::MyTest2()
{
    if(gdb)
    {
        // else where: gdb = new TGDB;
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
    }
}

更新：在我的环形缓冲区实现中发现了一个错误。在重负载下，当缓冲区包装时，我没有正确检测到完整的缓冲区，因此缓冲区没有返回。我很确定这个问题现在已经解决了。一旦我解决了环形缓冲区问题，性能就变得更好了。但是，如果我减少 iWaitTime，我的死锁（或冻结问题）就会再次出现。

因此，经过更重负载的进一步测试后，看来我的僵局并没有消失。在超重负载下，我继续陷入僵局，或者至少我的应用程序冻结了，但自从我修复了环形缓冲区问题以来，它几乎没有使用过。如果我将 MyTest2 中的 Print 调用次数加倍，我每次都很容易锁定......

此外，我更新的代码也反映在上面。我知道确保我的套装和重置事件调用位于临界区调用内。

原文

Having what appears to be a dead-lock situation with a multi-threaded logging application.

Little background:

My main application has 4-6 threads running. The main thread responsible for monitoring health of various things I'm doing, updating GUIs, etc... Then I have a transmit thread and a receive thread. The transmit and receive threads talk to physical hardware. I sometimes need to debug the data that the transmit and receive threads are seeing; i.e. print to a console without interrupting them due to their time critical nature of the data. The data, by the way, is on a USB bus.

Due to the threading nature of the application, I want to create a debug console that I can send messages to from my other threads. The debug consule runs as a low priority thread and implements a ring buffer such that when you print to the debug console, the message is quickly stored to a ring buffer and sets and event. The debug console's thread sits WaitingOnSingleObject events from the in bound messages that come in. When an event is detected, console thread updates a GUI display with the message. Simple eh? The printing calls and the console thread use a critical section to control access.

NOTE: I can adjust the ring buffer size if I see that I am dropping messages (at least that's the idea).

In a test application, the console works very well if I call its Print method slowly via mouse clicks. I have a button that I can press to send messages to the console and it works. However, if I put any sort of load (many calls to Print method), everything dead-locks. When I trace the dead-lock, my IDE's debugger traces to EnterCriticalSection and sits there.

NOTE: If I remove the Lock/UnLock calls and just use Enter/LeaveCriticalSection (see the code) I sometimes work but still find myself in a dead-lock situation. To rule out deadlocks to stack push/pops, I call Enter/LeaveCriticalSection directly now but this did not solve my issue.... What's going on here?

Here is one Print statement, that allows me to pass in a simple int to the display console.

void TGDB::Print(int I)
{
    //Lock();
    EnterCriticalSection(&CS);

    if( !SuppressOutput )
    {
        //swprintf( MsgRec->Msg, L"%d", I);
        sprintf( MsgRec->Msg, "%d", I);
        MBuffer->PutMsg(MsgRec, 1);
    }

    SetEvent( m_hEvent );
    LeaveCriticalSection(&CS);
    //UnLock();
}

// My Lock/UnLock methods
void TGDB::Lock(void)
{
    EnterCriticalSection(&CS);
}

bool TGDB::TryLock(void)
{
    return( TryEnterCriticalSection(&CS) );
}

void TGDB::UnLock(void)
{
        LeaveCriticalSection(&CS);
}

// This is how I implemented Console's thread routines

DWORD WINAPI TGDB::ConsoleThread(PVOID pA)
{
DWORD rVal;

         TGDB *g = (TGDB *)pA;
        return( g->ProcessMessages() );
}

DWORD TGDB::ProcessMessages()
{
DWORD rVal;
bool brVal;
int MsgCnt;

    do
    {
        rVal = WaitForMultipleObjects(1, &m_hEvent, true, iWaitTime);

        switch(rVal)
        {
            case WAIT_OBJECT_0:

                EnterCriticalSection(&CS);
                //Lock();

                if( KeepRunning )
                {
                    Info->Caption = "Rx";
                    Info->Refresh();
                    MsgCnt = MBuffer->GetMsgCount();

                    for(int i=0; i<MsgCnt; i++)
                    {
                        MBuffer->GetMsg( MsgRec, 1);
                        Log->Lines->Add(MsgRec->Msg);
                    }
                }

                brVal = KeepRunning;
                ResetEvent( m_hEvent );
                LeaveCriticalSection(&CS);
                //UnLock();

            break;

            case WAIT_TIMEOUT:
                EnterCriticalSection(&CS);
                //Lock();
                Info->Caption = "Idle";
                Info->Refresh();
                brVal = KeepRunning;
                ResetEvent( m_hEvent );
                LeaveCriticalSection(&CS);
                //UnLock();
            break;

            case WAIT_FAILED:
                EnterCriticalSection(&CS);
                //Lock();
                brVal = false;
                Info->Caption = "ERROR";
                Info->Refresh();
                aLine.sprintf("Console error: [%d]", GetLastError() );
                Log->Lines->Add(aLine);
                aLine = "";
                LeaveCriticalSection(&CS);
                //UnLock();
            break;
        }

    }while( brVal );

    return( rVal );
}

MyTest1 and MyTest2 are just two test functions that I call in response to a button press. MyTest1 never causes a problem no matter how fast I click the button. MyTest2 dead locks nearly everytime.

// No Dead Lock
void TTest::MyTest1()
{
    if(gdb)
    {
        // else where: gdb = new TGDB;
        gdb->Print(++I);
    }
}


// Causes a Dead Lock
void TTest::MyTest2()
{
    if(gdb)
    {
        // else where: gdb = new TGDB;
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
    }
}

UPDATE:
Found a bug in my ring buffer implementation. Under heavy load, when buffer wrapped, I didn't detect a full buffer properly so buffer was not returning. I'm pretty sure that issue is now resolved. Once I fixed the ring buffer issue, performance got much better. However, if I decrease the iWaitTime, my dead lock (or freeze up issue) returns.

So after further tests with a much heavier load it appears my deadlock is not gone. Under super heavy load I continue to deadlock or at least my app freezes up but no where near it use to since I fixed ring buffer problem. If I double the number of Print calls in MyTest2 I easily can lock up every time....

Also, my updated code is reflected above. I know make sure my Set & Reset event calls are inside critical section calls.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

仙气飘飘 2024-10-25 02:00:42

关闭这些选项后，我会询问有关此“信息”对象的问题。它是一个窗口吗？它是哪个窗口的父窗口？它是在哪个线程上创建的？

如果 Info 或其父窗口是在另一个线程上创建的，则可能会发生以下情况：

控制台线程位于关键部分内，正在处理消息。
主线程调用 Print() 并阻塞在关键部分上，等待控制台线程释放锁。
控制台线程调用 Info (Caption) 上的函数，这会导致系统向窗口发送消息 (WM_SETTEXT)。 SendMessage 会阻塞，因为目标线程不处于消息可警报状态（在调用 GetMessage/WaitMessage/MsgWaitForMultipleObjects 时不会阻塞）。

现在你陷入了僵局。

每当您将阻塞例程与与窗口交互的任何内容混合在一起时，就会发生这种#$(%^。在 GUI 线程上使用的唯一合适的阻塞函数是 MSGWaitForMultipleObjects，否则对线程上托管的窗口的 SendMessage 调用很容易陷入死锁。

避免这种情况涉及两种可能的方法：

永远不要在工作线程中进行任何 GUI 交互，仅使用 PostMessage 将非阻塞 UI 更新命令分派到 UI 线程，或者
使用内核事件对象 + MSGWaitForMultipleObjects（在 GUI 线程上）以确保即使在阻塞时也是如此。在资源上，您仍在发送消息。

回复收藏 0 原文

把梦留给海 2024-10-25 02:00:42

如果不知道在哪里陷入僵局，则很难弄清楚这段代码。两条评论：

鉴于这是 C++，您应该使用 Auto 对象来执行锁定和解锁。以防万一 Log 抛出异常变得非灾难性。
您正在重置事件以响应 WAIT_TIMEOUT。这为第二次 Print() 调用在工作线程从 WaitForMultiple 返回时但在进入临界区之前设置事件留下了一个小机会窗口。这将导致当实际有数据挂起时事件被重置。

但您确实需要调试它并揭示它“死锁”的位置。如果一个线程卡在 EnterCriticalSection 上，那么我们可以找出原因。如果两个线程都不是，则不完整的打印只是事件丢失的结果。

回复收藏 0 原文

风吹过旳痕迹 2024-10-25 02:00:42

我强烈推荐无锁实现。

这不仅可以避免潜在的死锁，而且调试工具是您绝对不想锁定的地方。格式化调试消息对多线程应用程序时序的影响已经够糟糕的了……仅仅因为您检测了它，就使用锁来同步并行代码，这使得调试变得徒劳。

我建议的是基于 SList 的设计（Win32 API 提供了 SList 实现，但您可以使用 InterlockedCompareExchange 和 InterlockedExchange 轻松构建线程安全模板）。每个线程都会有一个缓冲区池。每个缓冲区都会跟踪它来自的线程，处理缓冲区后，日志管理器会将缓冲区发布回源线程的 SList 以供重用。希望写入消息的线程会将缓冲区发布到记录器线程。这还可以防止任何线程导致其他线程的缓冲区不足。当缓冲区被放入队列时唤醒记录器线程的事件完成了设计。

回复收藏 0 原文

~没有更多了~