win32::WaitForSingleObject 期间 Windows 上的 Boost.Thread 断言/崩溃
我的代码中有一个很少出现的问题,其中触发了涉及 Boost.Thread 库的断言。我无法使用独立的示例重现此问题,而且我真的不知道是什么原因导致的,因此很难提供示例案例。我希望任何熟悉 boost.thread 内部结构的人都能提供帮助。
我知道的是:
- 当声明
boost::lock_guard
(或 unique_lock 和普通非递归互斥体的变体)时,就会出现问题。 - 它发生在 Boost.Asio 的处理函数中。堆栈上是执行 io_service::run 的线程,这是一组调用 Asio 回调函数的粘合剂,后面是我的回调函数(由 async_write 调用触发)。该函数的第一行是导致问题的
lock_guard<>
的声明。 - 我的函数内的
this
是有效的,并且尚未被删除或类似的内容。调试器显示它指向有效数据。在我的handle_write
函数中锁定的互斥体还可以防止删除处理函数使用的内存。 - 在大量多线程使用的情况下,这工作得很好,我想说 10,000 次中有 9,999 次。如果我将应用程序使用的线程数减少到仅一个处理 Asio run() 调用的线程和一个主 UI 线程,则问题出现的频率相同。
- 我的代码的第一行调用互斥体的
lock()
方法(在boost::unique_lock<>
的构造函数中),然后调用lock( )
在boost::detail::basic_recursive_mutex_impl
中,调用boost::detail::basic_timed_mutex
的lock()
方法。 在 Boost 1.46 中,断言 (
<前><代码>做 { BOOST_VERIFY(win32::WaitForSingleObject( sem,::boost::detail::win32::infinite)==0); 清除等待和尝试锁定(旧计数); lock_acquired=!(old_count&lock_flag_value); } while(!lock_acquired);BOOST_VERIFY
) 位于 basic_timed_mutex.hpp 第 78 行,它调用 win32::WaitForSingleObject:- 此时 Boost.Thread 代码正在等待获取互斥锁(使用
WaitForSingleObject
的代码路径的作用),没有其他线程持有互斥锁(至少在断言发生时,并且可以在调试器中检查) 。这很奇怪,因为它应该能够获得锁,而不必等待另一个线程放弃控制。 - 检查互斥体的成员,事情看起来很奇怪。这些是所有局部变量和成员变量的值(除非另有说明,否则每次发生这种情况时它们都是相同的):
sem
- 0xdddddddddddddddd - 每次崩溃时,这始终是相同的。lock_acquired
- 假。old_count
- 0xdddddddddddddddd。this
- 看起来是有效的,并且它的地址与持有它的对象的地址匹配(handle_write
是一个方法的对象)。它似乎没有被删除或以任何方式弄乱。this->active_count
- 一个负整数,我见过的范围在 -570000000 到 -580000000 之间。此->事件
- 0xdddddddddddddddd。
不幸的是,我无法看到 WaitForSingleObject
调用的结果。 API 函数上的 MSDN 条目 指示四个可能的返回类型,其中两种在这种情况下是不可能的。由于使用无效事件句柄调用 WaitForSingleObject
(sem
= 0xdddddddddddddddd
),我假设它返回 0xFFFFFFFF
并且GetLastError 将指示提供了无效句柄。
因此,实际问题似乎是 basic_timed_mutex
的 get_event()
方法返回 0xdddddddddddddddd
。但是, CreateEvent
的 MSDN 条目 ( get_event() 最终使用的)告诉我它返回事件的有效句柄,或者返回 NULL。
同样,这可能是我能提供的对问题的最佳描述,因为它在这个特定应用程序之外无法可靠地重现。我希望有人对可能造成这种情况的原因有想法!
I have a rarely occurring issue in my code in which an assertion is triggered, involving the Boost.Thread library. I haven't been able to reproduce this issue using a stand-alone example, and I don't really know what is causing it, so it's hard to provide a sample case. I am hoping that anybody familiar with the internals on boost.thread may be able to help.
Here is what I know:
- The problem occurs when a
boost::lock_guard<boost::recursive_mutex>
(or variations of unique_lock and normal non-recursive mutex) is declared. - It happens in a handler function for Boost.Asio. On the stack is the thread that does
io_service::run
, a bunch of glue to call the Asio callback function, followed by my callback function (triggered by an async_write call). The first line of that function is the declaration of thelock_guard<>
which is causing the problem. this
inside of my function is valid, and has not been deleted or anything like that. The debugger shows that it points to valid data. The mutex that is being locked in myhandle_write
function also guards against deletion of the memory that the handling function uses.- This works fine, I'd say 9,999 times out of 10,000, with heavy multi-threaded usage going on. The problem occurs with the same frequency if I tone down the number of threads used by the application to just one thread which handles Asio run() calls, and a main UI thread.
- The first line of my code calls the
lock()
method of the mutex (in the ctor ofboost::unique_lock<>
), then callslock()
inboost::detail::basic_recursive_mutex_impl
, which calls thelock()
method ofboost::detail::basic_timed_mutex
. In Boost 1.46, the assertion (
BOOST_VERIFY
) is on line 78 of basic_timed_mutex.hpp, which calls win32::WaitForSingleObject:do { BOOST_VERIFY(win32::WaitForSingleObject( sem,::boost::detail::win32::infinite)==0); clear_waiting_and_try_lock(old_count); lock_acquired=!(old_count&lock_flag_value); } while(!lock_acquired);
- At the time the Boost.Thread code is waiting to acquire a lock on the mutex (what this code path that uses
WaitForSingleObject
) does, no other thread is holding the mutex (at least at the time the assertion occurs, and can be examined in the debugger). This is odd because it should be able to obtain the lock without having to wait for another thread to relinquish control. - Things look very odd, examining the members of the mutex. These are the values of all of the local and member variables (unless otherwise noted, they are the same every time this happens):
sem
- 0xdddddddddddddddd - This is always the same, on every crash.lock_acquired
- false.old_count
- 0xdddddddddddddddd.this
- Appears to be valid, and the address of it matches what the object holding it has (the object of whichhandle_write
is a method). It does not appear to have been deleted or messed with in any way.this->active_count
- A negative integer, ranges I've seen have been between -570000000 and -580000000.this->event
- 0xdddddddddddddddd.
I am unfortunately unable to see the result of the WaitForSingleObject
call. The MSDN entry on the API function indicates four possible return types, two of them impossible in this scenario. Since WaitForSingleObject
is being called with an invalid event handle (sem
= 0xdddddddddddddddd
), I assume it's returning 0xFFFFFFFF
and GetLastError would indicate that an invalid handle has been supplied.
So the actual problem, it seems, is that the get_event()
method of basic_timed_mutex
is returning 0xdddddddddddddddd
. However, the MSDN entry for CreateEvent
(which get_event()
eventually uses) tells me that it returns either a valid handle to an event, or NULL
.
Again, this is probably the best description of the problem I can provide since it isn't reproducible reliably outside of this specific application. I hope somebody has ideas as to what may be causing this!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我想很难对您的问题给出准确的答案,但您似乎遇到了堆损坏问题,您是否尝试过在启用正常页面堆的情况下使用 AppVerifier ?
如果您随后将调试器附加到进程并发生堆损坏,那么当遇到损坏的堆块时,它有望中断,您甚至可以查看分配代码的调用堆栈。
编辑:如果使用WinDbg,您还可以在WaitForSingleObject(或任何其他函数)上放置一个条件断点,仅当调用失败时才会中断,然后检查最后一个错误,例如: bp kernel32!WaitForSingleObject " gu; .if(eax == 0) {g}" ->这将告诉调试器在断点处 i) 运行到函数末尾 (gu) 并 ii) 检查返回值(存储在 EAX 寄存器中),如果一切正常则继续执行 (g)。如果返回错误,您可以使用 !gle 扩展命令检查 GetLastError() 的值。
I guess it will be very difficult to give a precise answer to your problem but it seems that you have a heap corruption problem, have you tried to use AppVerifier with normal pageheap enabled?
If you then attach a debugger to the process and have a heap corruption it will hopefully break when a corrupted heap block is encountered and you can even look at the callstack of the allocating code.
edit: if using WinDbg you can also put a conditional breakpoint on WaitForSingleObject (or any other function) breaking only if the call fails and then check the last error, e.g.: bp kernel32!WaitForSingleObject "gu; .if(eax == 0) {g}" -> this will tell the debugger to at the breakpoint i) run to the end of the function (gu) and ii) check the return value (stored in the EAX register) and continue execution (g) if everything was fine. In case that an error is returned you can check the value of GetLastError() with the !gle extension command.