等待 C++ 中分离的线程完成
在 C++ 中如何等待分离线程完成?
我不关心退出状态,我只想知道线程是否已完成。
我正在尝试为异步第三方工具提供同步包装器。问题是涉及回调的奇怪的竞争条件崩溃。进展是:
- 我调用第三方,并
- 在第三方完成时注册回调,它通知我使用回调——在一个我无法真正控制的分离线程中。
- 我希望 (1) 中的线程等待 (2) 被调用。
我想将其包装在提供阻塞调用的机制中。到目前为止,我已经:
class Wait {
public:
void callback() {
pthread_mutex_lock(&m_mutex);
m_done = true;
pthread_cond_broadcast(&m_cond);
pthread_mutex_unlock(&m_mutex);
}
void wait() {
pthread_mutex_lock(&m_mutex);
while (!m_done) {
pthread_cond_wait(&m_cond, &m_mutex);
}
pthread_mutex_unlock(&m_mutex);
}
private:
pthread_mutex_t m_mutex;
pthread_cond_t m_cond;
bool m_done;
};
// elsewhere...
Wait waiter;
thirdparty_utility(&waiter);
waiter.wait();
据我所知,这应该有效,而且通常可以,但有时会崩溃。据我从corefile可以确定,我对问题的猜测是这样的:
- 当回调广播m_done结束时,等待线程被唤醒
- 等待线程现在在这里完成,并且Wait被销毁。 Wait 的所有成员都被销毁,包括互斥体和条件。
- 回调线程尝试从广播点继续,但现在正在使用已释放的内存,这会导致内存损坏。
- 当回调线程尝试返回(高于我糟糕的回调方法的级别)时,程序崩溃(通常带有 SIGSEGV,但我见过 SIGILL 几次)。
我尝试了很多不同的机制来尝试解决这个问题,但没有一个能解决问题。我仍然偶尔会看到崩溃。
编辑:更多详细信息:
这是大规模多线程应用程序的一部分,因此创建静态等待是不切实际的。
我运行了一个测试,在堆上创建 Wait,并故意泄漏内存(即 Wait 对象永远不会被释放),这没有导致崩溃。所以我确信这是等待过早释放的问题。
我还在 wait
中解锁后尝试使用 sleep(5)
进行测试,也没有产生崩溃。但我讨厌依赖这样的拼凑。
编辑:第三方详细信息:
我一开始认为这不相关,但我想得越多,我就越认为这是真正的问题:
我提到的第三方东西,以及为什么我有无法控制线程:这是使用 CORBA。
因此,CORBA 保留对我的对象的引用的时间可能比预期的要长。
How can I wait for a detached thread to finish in C++?
I don't care about an exit status, I just want to know whether or not the thread has finished.
I'm trying to provide a synchronous wrapper around an asynchronous thirdarty tool. The problem is a weird race condition crash involving a callback. The progression is:
- I call the thirdparty, and register a callback
- when the thirdparty finishes, it notifies me using the callback -- in a detached thread I have no real control over.
- I want the thread from (1) to wait until (2) is called.
I want to wrap this in a mechanism that provides a blocking call. So far, I have:
class Wait {
public:
void callback() {
pthread_mutex_lock(&m_mutex);
m_done = true;
pthread_cond_broadcast(&m_cond);
pthread_mutex_unlock(&m_mutex);
}
void wait() {
pthread_mutex_lock(&m_mutex);
while (!m_done) {
pthread_cond_wait(&m_cond, &m_mutex);
}
pthread_mutex_unlock(&m_mutex);
}
private:
pthread_mutex_t m_mutex;
pthread_cond_t m_cond;
bool m_done;
};
// elsewhere...
Wait waiter;
thirdparty_utility(&waiter);
waiter.wait();
As far as I can tell, this should work, and it usually does, but sometimes it crashes. As far as I can determine from the corefile, my guess as to the problem is this:
- When the callback broadcasts the end of m_done, the wait thread wakes up
- The wait thread is now done here, and Wait is destroyed. All of Wait's members are destroyed, including the mutex and cond.
- The callback thread tries to continue from the broadcast point, but is now using memory that's been released, which results in memory corruption.
- When the callback thread tries to return (above the level of my poor callback method), the program crashes (usually with a SIGSEGV, but I've seen SIGILL a couple of times).
I've tried a lot of different mechanisms to try to fix this, but none of them solve the problem. I still see occasional crashes.
EDIT: More details:
This is part of a massively multithreaded application, so creating a static Wait isn't practical.
I ran a test, creating Wait on the heap, and deliberately leaking the memory (i.e. the Wait objects are never deallocated), and that resulted in no crashes. So I'm sure it's a problem of Wait being deallocated too soon.
I've also tried a test with a sleep(5)
after the unlock in wait
, and that also produced no crashes. I hate to rely on a kludge like that though.
EDIT: ThirdParty details:
I didn't think this was relevant at first, but the more I think about it, the more I think it's the real problem:
The thirdparty stuff I mentioned, and why I have no control over the thread: this is using CORBA.
So, it's possible that CORBA is holding onto a reference to my object longer than intended.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
是的,我相信您所描述的情况正在发生(解除分配时的竞争条件)。解决此问题的一种快速方法是创建一个不会被销毁的 Wait 静态实例。只要您不需要同时有超过一名服务员,这种方法就可以发挥作用。
您还将永久使用该内存,它不会释放。但看起来这并不算太糟糕。
主要问题是很难协调线程之间的线程通信构造的生命周期:当可以安全销毁时,您总是需要至少一个剩余的通信构造来进行通信(至少在没有垃圾回收的语言中,如 C++)。
编辑:
有关使用全局互斥体重新计数的一些想法,请参阅评论。
Yes, I believe that what you're describing is happening (race condition on deallocate). One quick way to fix this is to create a static instance of Wait, one that won't get destroyed. This will work as long as you don't need to have more than one waiter at the same time.
You will also permanently use that memory, it will not deallocate. But it doesn't look like that's too bad.
The main issue is that it's hard to coordinate lifetimes of your thread communication constructs between threads: you will always need at least one leftover communication construct to communicate when it is safe to destroy (at least in languages without garbage collection, like C++).
EDIT:
See comments for some ideas about refcounting with a global mutex.
据我所知,没有可移植的方法来直接询问线程是否已完成运行(即没有 pthread_ 函数)。您所做的是正确的方法,至少就您发出的条件而言是这样。如果您看到崩溃,则确信是由于创建该对象的线程退出时
Wait
对象被释放(而不是某些其他微妙的锁定问题 - 所有太常见了),问题是您需要通过从执行通知的线程以外的线程进行管理来确保Wait
不会被释放。将其放入全局内存中或动态分配它并与该线程共享。最简单的是,不要让正在等待的线程拥有等待的内存,而是让正在等待的线程拥有它。To the best of my knowledge there's no portable way to directly ask a thread if its done running (i.e. no
pthread_
function). What you are doing is the right way to do it, at least as far as having a condition that you signal. If you are seeing crashes that you are sure are due to theWait
object is being deallocated when the thread that creates it quits (and not some other subtle locking issue -- all too common), the issue is that you need to make sure theWait
isn't being deallocated, by managing from a thread other than the one that does the notification. Put it in global memory or dynamically allocate it and share it with that thread. Most simply don't have the thread being waited on own the memory for the Wait, have the thread doing the waiting own it.您是否正确初始化和销毁了互斥体和条件变量?
确保您没有过早地销毁
Wait
对象 - 如果它在一个线程中被销毁,而另一个线程仍然需要它,您将遇到竞争条件,这可能会导致段错误。我建议将其设置为全局静态变量,该变量在程序初始化时(在main()
之前)构造并在程序退出时销毁。Are you initializing and destroying the mutex and condition var properly?
Make sure that you aren't prematurely destroying the
Wait
object -- if it gets destroyed in one thread while the other thread still needs it, you'll get a race condition that will likely result in a segfault. I'd recommend making it a global static variable that gets constructed on program initialization (beforemain()
) and gets destroyed on program exit.如果您的假设是正确的,那么第三方模块似乎有问题,您需要想出某种技巧来使您的应用程序正常工作。
静态
Wait
是不可行的。Wait
池怎么样(它甚至可以按需增长)?您的应用程序是否使用线程池来运行?尽管当第三方模块仍在使用相同的
Wait
时,仍然有可能会重复使用它。但是,您可以通过在池中正确排队空闲等待来最大程度地减少这种机会。免责声明:我绝不是线程安全方面的专家,因此请将此帖子视为外行的建议。
If your assumption is correct then third party module appears to be buggy and you need to come up with some kind of hack to make your application work.
Static
Wait
is not feasible. How aboutWait
pool (it even may grow on demand)? Is you application using thread pool to run?Although there will still be a chance that same
Wait
will be reused while third party module is still using it. But you can minimize such chance by properly queing vacant Waits in your pool.Disclaimer: I am in no way an expert in thread safety, so consider this post as a suggestion from a layman.