为什么我的线程被关键部分阻塞而没有被任何东西持有?
我在 C++ 中的关键部分遇到问题。我遇到一个挂起的窗口,当我转储进程时,我可以看到线程在关键部分等待:
16 Id: b10.b88 Suspend: 1 Teb: 7ffae000 Unfrozen
ChildEBP RetAddr
0470f158 7c90df3c ntdll!KiFastSystemCallRet
0470f15c 7c91b22b ntdll!NtWaitForSingleObject+0xc
0470f1e4 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
0470f1ec 0415647e ntdll!RtlEnterCriticalSection+0x46
行数据等都指示进入特定关键部分。唯一的问题是,似乎没有其他线程使该关键部分保持打开状态。 Windbg 的 !locks 命令没有指示任何内容,转储临界区表明它没有被锁定,如下面结构中的 null 所有者和 -1 LockCount 所示。
0:016> dt _RTL_CRITICAL_SECTION 42c2318
_RTL_CRITICAL_SECTION
+0x000 DebugInfo : 0x02c8b318 _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount : -1
+0x008 RecursionCount : -1
+0x00c OwningThread : (null)
+0x010 LockSemaphore : 0x00000340
+0x014 SpinCount : 0
0:016> dt _RTL_CRITICAL_SECTION_DEBUG 2c8b318
_RTL_CRITICAL_SECTION_DEBUG
+0x000 Type : 0
+0x002 CreatorBackTraceIndex : 0x2911
+0x004 CriticalSection : 0x042c2318 _RTL_CRITICAL_SECTION
+0x008 ProcessLocksList : _LIST_ENTRY [ 0x2c8b358 - 0x2c8b2e8 ]
+0x010 EntryCount : 1
+0x014 ContentionCount : 1
+0x018 Flags : 0xbaadf00d
+0x01c CreatorBackTraceIndexHigh : 0xf00d
+0x01e SpareWORD : 0xbaad
这怎么可能?即使在另一个线程没有调用 LeaveCriticalSection 的死锁中,我也希望看到临界区本身被标记为锁定。有人有任何调试建议或可能的修复吗?
I am having an issue with a critical section in C++. I'm getting a hung window and when I dump the process I can see the thread waiting on a critical section:
16 Id: b10.b88 Suspend: 1 Teb: 7ffae000 Unfrozen
ChildEBP RetAddr
0470f158 7c90df3c ntdll!KiFastSystemCallRet
0470f15c 7c91b22b ntdll!NtWaitForSingleObject+0xc
0470f1e4 7c901046 ntdll!RtlpWaitForCriticalSection+0x132
0470f1ec 0415647e ntdll!RtlEnterCriticalSection+0x46
The line data, etc, all indicates entry into a specific critical section. The only problem is that no other threads appear to be holding this critical section open. There's nothing indicated by Windbg's !locks command and dumping the critical section indicates it's not locked as can be seen by the null owner and the -1 LockCount in the structure below.
0:016> dt _RTL_CRITICAL_SECTION 42c2318
_RTL_CRITICAL_SECTION
+0x000 DebugInfo : 0x02c8b318 _RTL_CRITICAL_SECTION_DEBUG
+0x004 LockCount : -1
+0x008 RecursionCount : -1
+0x00c OwningThread : (null)
+0x010 LockSemaphore : 0x00000340
+0x014 SpinCount : 0
0:016> dt _RTL_CRITICAL_SECTION_DEBUG 2c8b318
_RTL_CRITICAL_SECTION_DEBUG
+0x000 Type : 0
+0x002 CreatorBackTraceIndex : 0x2911
+0x004 CriticalSection : 0x042c2318 _RTL_CRITICAL_SECTION
+0x008 ProcessLocksList : _LIST_ENTRY [ 0x2c8b358 - 0x2c8b2e8 ]
+0x010 EntryCount : 1
+0x014 ContentionCount : 1
+0x018 Flags : 0xbaadf00d
+0x01c CreatorBackTraceIndexHigh : 0xf00d
+0x01e SpareWORD : 0xbaad
How is this possible? Even in a deadlock where another thread has not called LeaveCriticalSection I would expect to see the critical section itself marked as locked. Does anyone have any debugging suggestions or possible fixes?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
事实证明,这是一个错误,在没有相应的 EnterCriticalSection 的情况下调用 LeaveCriticalSection。这导致临界区将 LockCount 和 RecursionCount 递减到以下状态(LockCount 的默认值为 -1,RecursionCount 为 0):
当执行后续 EnterCriticalSection 时,它挂起,因为 RecursionCount 不为零 - 线程只能取得所有权如果 RecursionCount 为 0,则它会增加 LockCount 的值(将其恢复到我原来问题中看到的 -1),这只是为了混淆问题。
总之,如果您看到关键部分以 LockCount 和 RecursionCount 均为 -1 的方式停止线程,则意味着解锁过多。
至于导致该情况的代码:
以及错误检查宏的定义:
该宏在其内容周围缺少大括号,因此不满足的 if 语句只会跳过 EnterCriticalSection。显然有问题。
It turned out to be a bug where LeaveCriticalSection was being called without a corresponding EnterCriticalSection. This caused the critical section to decrement LockCount and RecursionCount into the following state (the default for LockCount is -1 and RecursionCount is 0):
When the subsequent EnterCriticalSection was performed, it hung because RecursionCount was non-zero - a thread can only take ownership of the critical section if RecursionCount is 0. However it did increment LockCount (taking it back to the -1 seen in my original question) just to confuse matters.
In summary if you see a critical section halting your thread with both LockCount and RecursionCount of -1, it means there was excessive unlocking.
As to the code causing it:
And the definition of the error-checking macro:
The macro lacks curly braces around its contents, so the if statement not being satisfied only skips EnterCriticalSection. Obviously a problem.