EnterCriticalSection 死锁

发布于 2024-10-18 02:00:42 字数 4294 浏览 4 评论 0原文

多线程日志记录应用程序出现死锁情况。

小背景:

我的主应用程序有 4-6 个线程正在运行。主线程负责监视我正在做的各种事情的运行状况、更新 GUI 等......然后我有一个传输线程和一个接收线程。发送和接收线程与物理硬件通信。我有时需要调试发送和接收线程看到的数据;即打印到控制台,而不会由于数据的时间紧迫性而中断它们。顺便说一句,数据位于 USB 总线上。

由于应用程序的线程性质,我想创建一个调试控制台,可以从其他线程向其发送消息。调试控制台作为低优先级线程运行,并实现环形缓冲区,以便当您打印到调试控制台时,消息会快速存储到环形缓冲区并设置和事件。调试控制台的线程根据传入的绑定消息来处理 WaitingOnSingleObject 事件。检测到事件时,控制台线程会使用该消息更新 GUI 显示。简单吧?打印调用和控制台线程使用关键部分来控制访问。

注意:如果我发现我正在丢弃消息,我可以调整环形缓冲区大小(至少是这样)。

在测试应用程序中,如果我通过鼠标单击缓慢调用其 Print 方法,控制台工作得非常好。我有一个按钮,可以按下它来向控制台发送消息,并且它可以工作。但是,如果我施加任何类型的负载(多次调用 Print 方法),一切都会死锁。当我跟踪死锁时,我的 IDE 调试器会跟踪 EnterCriticalSection 并驻留在那里。

注意:如果我删除 Lock/UnLock 调用并仅使用 Enter/LeaveCriticalSection(请参阅代码),我有时会工作,但仍然发现自己处于死锁情况。为了排除堆栈推送/弹出的死锁,我现在直接调用 Enter/LeaveCriticalSection 但这并没有解决我的问题....这是怎么回事?

这是一个 Print 语句,它允许我将一个简单的 int 传递到显示控制台。

void TGDB::Print(int I)
{
    //Lock();
    EnterCriticalSection(&CS);

    if( !SuppressOutput )
    {
        //swprintf( MsgRec->Msg, L"%d", I);
        sprintf( MsgRec->Msg, "%d", I);
        MBuffer->PutMsg(MsgRec, 1);
    }

    SetEvent( m_hEvent );
    LeaveCriticalSection(&CS);
    //UnLock();
}

// My Lock/UnLock methods
void TGDB::Lock(void)
{
    EnterCriticalSection(&CS);
}

bool TGDB::TryLock(void)
{
    return( TryEnterCriticalSection(&CS) );
}

void TGDB::UnLock(void)
{
        LeaveCriticalSection(&CS);
}

// This is how I implemented Console's thread routines

DWORD WINAPI TGDB::ConsoleThread(PVOID pA)
{
DWORD rVal;

         TGDB *g = (TGDB *)pA;
        return( g->ProcessMessages() );
}

DWORD TGDB::ProcessMessages()
{
DWORD rVal;
bool brVal;
int MsgCnt;

    do
    {
        rVal = WaitForMultipleObjects(1, &m_hEvent, true, iWaitTime);

        switch(rVal)
        {
            case WAIT_OBJECT_0:

                EnterCriticalSection(&CS);
                //Lock();

                if( KeepRunning )
                {
                    Info->Caption = "Rx";
                    Info->Refresh();
                    MsgCnt = MBuffer->GetMsgCount();

                    for(int i=0; i<MsgCnt; i++)
                    {
                        MBuffer->GetMsg( MsgRec, 1);
                        Log->Lines->Add(MsgRec->Msg);
                    }
                }

                brVal = KeepRunning;
                ResetEvent( m_hEvent );
                LeaveCriticalSection(&CS);
                //UnLock();

            break;

            case WAIT_TIMEOUT:
                EnterCriticalSection(&CS);
                //Lock();
                Info->Caption = "Idle";
                Info->Refresh();
                brVal = KeepRunning;
                ResetEvent( m_hEvent );
                LeaveCriticalSection(&CS);
                //UnLock();
            break;

            case WAIT_FAILED:
                EnterCriticalSection(&CS);
                //Lock();
                brVal = false;
                Info->Caption = "ERROR";
                Info->Refresh();
                aLine.sprintf("Console error: [%d]", GetLastError() );
                Log->Lines->Add(aLine);
                aLine = "";
                LeaveCriticalSection(&CS);
                //UnLock();
            break;
        }

    }while( brVal );

    return( rVal );
}

MyTest1 和 MyTest2 只是我响应按钮按下而调用的两个测试函数。无论我单击按钮的速度有多快,MyTest1 都不会引起问题。 MyTest2 几乎每次都会死锁。

// No Dead Lock
void TTest::MyTest1()
{
    if(gdb)
    {
        // else where: gdb = new TGDB;
        gdb->Print(++I);
    }
}


// Causes a Dead Lock
void TTest::MyTest2()
{
    if(gdb)
    {
        // else where: gdb = new TGDB;
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
    }
}

更新: 在我的环形缓冲区实现中发现了一个错误。在重负载下,当缓冲区包装时,我没有正确检测到完整的缓冲区,因此缓冲区没有返回。我很确定这个问题现在已经解决了。一旦我解决了环形缓冲区问题,性能就变得更好了。但是,如果我减少 iWaitTime,我的死锁(或冻结问题)就会再次出现。

因此,经过更重负载的进一步测试后,看来我的僵局并没有消失。在超重负载下,我继续陷入僵局,或者至少我的应用程序冻结了,但自从我修复了环形缓冲区问题以来,它几乎没有使用过。如果我将 MyTest2 中的 Print 调用次数加倍,我每次都很容易锁定......

此外,我更新的代码也反映在上面。我知道确保我的套装和重置事件调用位于临界区调用内。

Having what appears to be a dead-lock situation with a multi-threaded logging application.

Little background:

My main application has 4-6 threads running. The main thread responsible for monitoring health of various things I'm doing, updating GUIs, etc... Then I have a transmit thread and a receive thread. The transmit and receive threads talk to physical hardware. I sometimes need to debug the data that the transmit and receive threads are seeing; i.e. print to a console without interrupting them due to their time critical nature of the data. The data, by the way, is on a USB bus.

Due to the threading nature of the application, I want to create a debug console that I can send messages to from my other threads. The debug consule runs as a low priority thread and implements a ring buffer such that when you print to the debug console, the message is quickly stored to a ring buffer and sets and event. The debug console's thread sits WaitingOnSingleObject events from the in bound messages that come in. When an event is detected, console thread updates a GUI display with the message. Simple eh? The printing calls and the console thread use a critical section to control access.

NOTE: I can adjust the ring buffer size if I see that I am dropping messages (at least that's the idea).

In a test application, the console works very well if I call its Print method slowly via mouse clicks. I have a button that I can press to send messages to the console and it works. However, if I put any sort of load (many calls to Print method), everything dead-locks. When I trace the dead-lock, my IDE's debugger traces to EnterCriticalSection and sits there.

NOTE: If I remove the Lock/UnLock calls and just use Enter/LeaveCriticalSection (see the code) I sometimes work but still find myself in a dead-lock situation. To rule out deadlocks to stack push/pops, I call Enter/LeaveCriticalSection directly now but this did not solve my issue.... What's going on here?

Here is one Print statement, that allows me to pass in a simple int to the display console.

void TGDB::Print(int I)
{
    //Lock();
    EnterCriticalSection(&CS);

    if( !SuppressOutput )
    {
        //swprintf( MsgRec->Msg, L"%d", I);
        sprintf( MsgRec->Msg, "%d", I);
        MBuffer->PutMsg(MsgRec, 1);
    }

    SetEvent( m_hEvent );
    LeaveCriticalSection(&CS);
    //UnLock();
}

// My Lock/UnLock methods
void TGDB::Lock(void)
{
    EnterCriticalSection(&CS);
}

bool TGDB::TryLock(void)
{
    return( TryEnterCriticalSection(&CS) );
}

void TGDB::UnLock(void)
{
        LeaveCriticalSection(&CS);
}

// This is how I implemented Console's thread routines

DWORD WINAPI TGDB::ConsoleThread(PVOID pA)
{
DWORD rVal;

         TGDB *g = (TGDB *)pA;
        return( g->ProcessMessages() );
}

DWORD TGDB::ProcessMessages()
{
DWORD rVal;
bool brVal;
int MsgCnt;

    do
    {
        rVal = WaitForMultipleObjects(1, &m_hEvent, true, iWaitTime);

        switch(rVal)
        {
            case WAIT_OBJECT_0:

                EnterCriticalSection(&CS);
                //Lock();

                if( KeepRunning )
                {
                    Info->Caption = "Rx";
                    Info->Refresh();
                    MsgCnt = MBuffer->GetMsgCount();

                    for(int i=0; i<MsgCnt; i++)
                    {
                        MBuffer->GetMsg( MsgRec, 1);
                        Log->Lines->Add(MsgRec->Msg);
                    }
                }

                brVal = KeepRunning;
                ResetEvent( m_hEvent );
                LeaveCriticalSection(&CS);
                //UnLock();

            break;

            case WAIT_TIMEOUT:
                EnterCriticalSection(&CS);
                //Lock();
                Info->Caption = "Idle";
                Info->Refresh();
                brVal = KeepRunning;
                ResetEvent( m_hEvent );
                LeaveCriticalSection(&CS);
                //UnLock();
            break;

            case WAIT_FAILED:
                EnterCriticalSection(&CS);
                //Lock();
                brVal = false;
                Info->Caption = "ERROR";
                Info->Refresh();
                aLine.sprintf("Console error: [%d]", GetLastError() );
                Log->Lines->Add(aLine);
                aLine = "";
                LeaveCriticalSection(&CS);
                //UnLock();
            break;
        }

    }while( brVal );

    return( rVal );
}

MyTest1 and MyTest2 are just two test functions that I call in response to a button press. MyTest1 never causes a problem no matter how fast I click the button. MyTest2 dead locks nearly everytime.

// No Dead Lock
void TTest::MyTest1()
{
    if(gdb)
    {
        // else where: gdb = new TGDB;
        gdb->Print(++I);
    }
}


// Causes a Dead Lock
void TTest::MyTest2()
{
    if(gdb)
    {
        // else where: gdb = new TGDB;
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
        gdb->Print(++I);
    }
}

UPDATE:
Found a bug in my ring buffer implementation. Under heavy load, when buffer wrapped, I didn't detect a full buffer properly so buffer was not returning. I'm pretty sure that issue is now resolved. Once I fixed the ring buffer issue, performance got much better. However, if I decrease the iWaitTime, my dead lock (or freeze up issue) returns.

So after further tests with a much heavier load it appears my deadlock is not gone. Under super heavy load I continue to deadlock or at least my app freezes up but no where near it use to since I fixed ring buffer problem. If I double the number of Print calls in MyTest2 I easily can lock up every time....

Also, my updated code is reflected above. I know make sure my Set & Reset event calls are inside critical section calls.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

仙气飘飘 2024-10-25 02:00:42

关闭这些选项后,我会询问有关此“信息”对象的问题。它是一个窗口吗?它是哪个窗口的父窗口?它是在哪个线程上创建的?

如果 Info 或其父窗口是在另一个线程上创建的,则可能会发生以下情况:

控制台线程位于关键部分内,正在处理消息。
主线程调用 Print() 并阻塞在关键部分上,等待控制台线程释放锁。
控制台线程调用 Info (Caption) 上的函数,这会导致系统向窗口发送消息 (WM_SETTEXT)。 SendMessage 会阻塞,因为目标线程不处于消息可警报状态(在调用 GetMessage/WaitMessage/MsgWaitForMultipleObjects 时不会阻塞)。

现在你陷入了僵局。

每当您将阻塞例程与与窗口交互的任何内容混合在一起时,就会发生这种#$(%^。在 GUI 线程上使用的唯一合适的阻塞函数是 MSGWaitForMultipleObjects,否则对线程上托管的窗口的 SendMessage 调用很容易陷入死锁。

避免这种情况涉及两种可能的方法:

  • 永远不要在工作线程中进行任何 GUI 交互,仅使用 PostMessage 将非阻塞 UI 更新命令分派到 UI 线程,或者
  • 使用内核事件对象 + MSGWaitForMultipleObjects(在 GUI 线程上)以确保即使在阻塞时也是如此。在资源上,您仍在发送消息。

With those options closed up, I would ask questions about this "Info" object. Is it a window, which window is it parented to, and which thread was it created on?

If Info, or its parent window, was created on the other thread, then the following situation might occur:

The Console Thread is inside a critical section, processing a message.
The Main thread calls Print() and blocks on a critical section waiting for the Console Thread to release the lock.
The Console thread calls a function on Info (Caption), which results in the system sending a message (WM_SETTEXT) to the window. SendMessage blocks because the target thread is not in a message alertable state (isn't blocked on a call to GetMessage/WaitMessage/MsgWaitForMultipleObjects).

Now you have a deadlock.

This kind of #$(%^ can happen whenever you mix blocking routines with anything that interacts with windows. The only appropriate blocking function to use on a GUI thread is MSGWaitForMultipleObjects otherwise SendMessage calls to windows hosted on the thread can easily deadlock.

Avoiding this involves two possible approaches:

  • Never doing any GUI interaction in worker threads. Only use PostMessage to dispatch non blocking UI update commands to the UI thread, OR
  • Use kernel Event objects + MSGWaitForMultipleObjects (on the GUI thread) to ensure that even when you are blocking on a resource, you are still dispatching messages.
把梦留给海 2024-10-25 02:00:42

如果不知道在哪里陷入僵局,则很难弄清楚这段代码。两条评论:

  • 鉴于这是 C++,您应该使用 Auto 对象来执行锁定和解锁。以防万一 Log 抛出异常变得非灾难性。

  • 您正在重置事件以响应 WAIT_TIMEOUT。这为第二次 Print() 调用在工作线程从 WaitForMultiple 返回时但在进入临界区之前设置事件留下了一个小机会窗口。这将导致当实际有数据挂起时事件被重置。

但您确实需要调试它并揭示它“死锁”的位置。如果一个线程卡在 EnterCriticalSection 上,那么我们可以找出原因。如果两个线程都不是,则不完整的打印只是事件丢失的结果。

Without knowing where it is deadlocking this code is hard to figure out. Two comments tho:

  • Given that this is c++, you should be using an Auto object to perform the lock and unlock. Just in case it ever becomes non catastrophic for Log to throw an exception.

  • You are resetting the event in response to WAIT_TIMEOUT. This leaves a small window of opportunity for a 2nd Print() call to set the event while the worker thread has returned from WaitForMultiple, but before it has entered the critical section. Which will result in the event being reset when there is actually data pending.

But you do need to debug it and reveal where it "Deadlocks". If one thread IS stuck on EnterCriticalSection, then we can find out why. If neither thread is, then the incomplete printing is just the result of an event getting lost.

风吹过旳痕迹 2024-10-25 02:00:42

我强烈推荐无锁实现。

这不仅可以避免潜在的死锁,而且调试工具是您绝对不想锁定的地方。格式化调试消息对多线程应用程序时序的影响已经够糟糕的了……仅仅因为您检测了它,就使用锁来同步并行代码,这使得调试变得徒劳。

我建议的是基于 SList 的设计(Win32 API 提供了 SList 实现,但您可以使用 InterlockedCompareExchange 和 InterlockedExchange 轻松构建线程安全模板)。每个线程都会有一个缓冲区池。每个缓冲区都会跟踪它来自的线程,处理缓冲区后,日志管理器会将缓冲区发布回源线程的 SList 以供重用。希望写入消息的线程会将缓冲区发布到记录器线程。这还可以防止任何线程导致其他线程的缓冲区不足。当缓冲区被放入队列时唤醒记录器线程的事件完成了设计。

I would strongly recommend a lockfree implementation.

Not only will this avoid potential deadlock, but debug instrumentation is one place where you absolutely do not want to take a lock. The impact of formatting debug messages on timing of a multi-threaded application is bad enough... having locks synchronize your parallel code just because you instrumented it makes debugging futile.

What I suggest is an SList-based design (The Win32 API provides an SList implementation, but you can build a thread-safe template easily enough using InterlockedCompareExchange and InterlockedExchange). Each thread will have a pool of buffers. Each buffer will track the thread it came from, after processing the buffer, the log manager will post the buffer back to the source thread's SList for reuse. Threads wishing to write a message will post a buffer to the logger thread. This also prevents any thread from starving other threads of buffers. An event to wake the logger thread when a buffer is placed into the queue completes the design.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文