等待 C++ 中分离的线程完成

发布于 2024-08-11 19:36:20 字数 1557 浏览 9 评论 0原文

在 C++ 中如何等待分离线程完成？

我不关心退出状态，我只想知道线程是否已完成。

我正在尝试为异步第三方工具提供同步包装器。问题是涉及回调的奇怪的竞争条件崩溃。进展是：

我调用第三方，并
在第三方完成时注册回调，它通知我使用回调——在一个我无法真正控制的分离线程中。
我希望 (1) 中的线程等待 (2) 被调用。

我想将其包装在提供阻塞调用的机制中。到目前为止，我已经：

class Wait {
  public:
  void callback() {
    pthread_mutex_lock(&m_mutex);
    m_done = true;
    pthread_cond_broadcast(&m_cond);
    pthread_mutex_unlock(&m_mutex);
  }

  void wait() {
    pthread_mutex_lock(&m_mutex);
    while (!m_done) {
      pthread_cond_wait(&m_cond, &m_mutex);
    }
    pthread_mutex_unlock(&m_mutex);
  }

  private:
  pthread_mutex_t m_mutex;
  pthread_cond_t  m_cond;
  bool            m_done;
};

// elsewhere...
Wait waiter;
thirdparty_utility(&waiter);
waiter.wait();

据我所知，这应该有效，而且通常可以，但有时会崩溃。据我从corefile可以确定，我对问题的猜测是这样的：

当回调广播m_done结束时，等待线程被唤醒
等待线程现在在这里完成，并且Wait被销毁。 Wait 的所有成员都被销毁，包括互斥体和条件。
回调线程尝试从广播点继续，但现在正在使用已释放的内存，这会导致内存损坏。
当回调线程尝试返回（高于我糟糕的回调方法的级别）时，程序崩溃（通常带有 SIGSEGV，但我见过 SIGILL 几次）。

我尝试了很多不同的机制来尝试解决这个问题，但没有一个能解决问题。我仍然偶尔会看到崩溃。

编辑：更多详细信息：

这是大规模多线程应用程序的一部分，因此创建静态等待是不切实际的。

我运行了一个测试，在堆上创建 Wait，并故意泄漏内存（即 Wait 对象永远不会被释放），这没有导致崩溃。所以我确信这是等待过早释放的问题。

我还在 wait 中解锁后尝试使用 sleep(5) 进行测试，也没有产生崩溃。但我讨厌依赖这样的拼凑。

编辑：第三方详细信息：

我一开始认为这不相关，但我想得越多，我就越认为这是真正的问题：

我提到的第三方东西，以及为什么我有无法控制线程：这是使用 CORBA。

因此，CORBA 保留对我的对象的引用的时间可能比预期的要长。

原文

How can I wait for a detached thread to finish in C++?

I don't care about an exit status, I just want to know whether or not the thread has finished.

I'm trying to provide a synchronous wrapper around an asynchronous thirdarty tool. The problem is a weird race condition crash involving a callback. The progression is:

I call the thirdparty, and register a callback
when the thirdparty finishes, it notifies me using the callback -- in a detached thread I have no real control over.
I want the thread from (1) to wait until (2) is called.

I want to wrap this in a mechanism that provides a blocking call. So far, I have:

class Wait {
  public:
  void callback() {
    pthread_mutex_lock(&m_mutex);
    m_done = true;
    pthread_cond_broadcast(&m_cond);
    pthread_mutex_unlock(&m_mutex);
  }

  void wait() {
    pthread_mutex_lock(&m_mutex);
    while (!m_done) {
      pthread_cond_wait(&m_cond, &m_mutex);
    }
    pthread_mutex_unlock(&m_mutex);
  }

  private:
  pthread_mutex_t m_mutex;
  pthread_cond_t  m_cond;
  bool            m_done;
};

// elsewhere...
Wait waiter;
thirdparty_utility(&waiter);
waiter.wait();

As far as I can tell, this should work, and it usually does, but sometimes it crashes. As far as I can determine from the corefile, my guess as to the problem is this:

When the callback broadcasts the end of m_done, the wait thread wakes up
The wait thread is now done here, and Wait is destroyed. All of Wait's members are destroyed, including the mutex and cond.
The callback thread tries to continue from the broadcast point, but is now using memory that's been released, which results in memory corruption.
When the callback thread tries to return (above the level of my poor callback method), the program crashes (usually with a SIGSEGV, but I've seen SIGILL a couple of times).

I've tried a lot of different mechanisms to try to fix this, but none of them solve the problem. I still see occasional crashes.

EDIT: More details:

This is part of a massively multithreaded application, so creating a static Wait isn't practical.

I ran a test, creating Wait on the heap, and deliberately leaking the memory (i.e. the Wait objects are never deallocated), and that resulted in no crashes. So I'm sure it's a problem of Wait being deallocated too soon.

I've also tried a test with a sleep(5) after the unlock in wait, and that also produced no crashes. I hate to rely on a kludge like that though.

EDIT: ThirdParty details:

I didn't think this was relevant at first, but the more I think about it, the more I think it's the real problem:

The thirdparty stuff I mentioned, and why I have no control over the thread: this is using CORBA.

So, it's possible that CORBA is holding onto a reference to my object longer than intended.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

自在安然 2024-08-18 19:36:20

是的，我相信您所描述的情况正在发生（解除分配时的竞争条件）。解决此问题的一种快速方法是创建一个不会被销毁的 Wait 静态实例。只要您不需要同时有超过一名服务员，这种方法就可以发挥作用。

您还将永久使用该内存，它不会释放。但看起来这并不算太糟糕。

主要问题是很难协调线程之间的线程通信构造的生命周期：当可以安全销毁时，您总是需要至少一个剩余的通信构造来进行通信（至少在没有垃圾回收的语言中，如 C++）。

编辑：
有关使用全局互斥体重新计数的一些想法，请参阅评论。

回复收藏 0 原文

以酷 2024-08-18 19:36:20

据我所知，没有可移植的方法来直接询问线程是否已完成运行（即没有 pthread_ 函数）。您所做的是正确的方法，至少就您发出的条件而言是这样。如果您看到崩溃，则确信是由于创建该对象的线程退出时 Wait 对象被释放（而不是某些其他微妙的锁定问题 - 所有太常见了），问题是您需要通过从执行通知的线程以外的线程进行管理来确保 Wait 不会被释放。将其放入全局内存中或动态分配它并与该线程共享。最简单的是，不要让正在等待的线程拥有等待的内存，而是让正在等待的线程拥有它。

回复收藏 0 原文

緦唸λ蓇 2024-08-18 19:36:20

您是否正确初始化和销毁了互斥体和条件变量？

Wait::Wait()
{
    pthread_mutex_init(&m_mutex, NULL);
    pthread_cond_init(&m_cond, NULL);
    m_done = false;
}

Wait::~Wait()
{
    assert(m_done);
    pthread_mutex_destroy(&m_mutex);
    pthread_cond_destroy(&m_cond);
}

确保您没有过早地销毁 Wait 对象 - 如果它在一个线程中被销毁，而另一个线程仍然需要它，您将遇到竞争条件，这可能会导致段错误。我建议将其设置为全局静态变量，该变量在程序初始化时（在 main() 之前）构造并在程序退出时销毁。

Are you initializing and destroying the mutex and condition var properly?

Wait::Wait()
{
    pthread_mutex_init(&m_mutex, NULL);
    pthread_cond_init(&m_cond, NULL);
    m_done = false;
}

Wait::~Wait()
{
    assert(m_done);
    pthread_mutex_destroy(&m_mutex);
    pthread_cond_destroy(&m_cond);
}

Make sure that you aren't prematurely destroying the Wait object -- if it gets destroyed in one thread while the other thread still needs it, you'll get a race condition that will likely result in a segfault. I'd recommend making it a global static variable that gets constructed on program initialization (before main()) and gets destroyed on program exit.

回复收藏 0 原文