条件变量的延迟广播唤醒 - 有效吗?

发布于 2024-12-06 18:54:05 字数 570 浏览 0 评论 0原文

我正在实现 pthread 条件变量(基于 Linux futexes),并且我有一个想法,可以使用进程共享条件变量来避免 pthread_cond_broadcast 上的“踩踏效应”。对于非进程共享 cond 变量,futex 重新排队操作传统上(即通过 NPTL)用于将等待者从 cond var 的 futex 重新排队到互斥体的 futex,而不唤醒它们,但这对于进程共享 cond 变量通常是不可能的,因为 pthread_cond_broadcast 可能没有指向关联互斥体的有效指针。在最坏的情况下,互斥锁甚至可能不会映射到其内存空间中。

我解决这个问题的想法是让 pthread_cond_broadcast 只直接唤醒一个等待者,并让该等待者在唤醒时执行重新排队操作,因为它确实具有指向互斥锁的所需指针。

当然,如果我采用这种方法,需要考虑很多丑陋的竞争条件,但如果可以克服它们,是否还有其他原因导致这种实现无效或不受欢迎?我能想到的一个可能无法克服的潜在问题是负责重新排队的服务员(一个单独的进程)在它可以采取行动之前被杀死的竞争,但通过放置 condvar 甚至可以克服这个问题futex 位于鲁棒互斥列表中,以便内核在进程终止时对其执行唤醒。

I'm implementing pthread condition variables (based on Linux futexes) and I have an idea for avoiding the "stampede effect" on pthread_cond_broadcast with process-shared condition variables. For non-process-shared cond vars, futex requeue operations are traditionally (i.e. by NPTL) used to requeue waiters from the cond var's futex to the mutex's futex without waking them up, but this is in general impossible for process-shared cond vars, because pthread_cond_broadcast might not have a valid pointer to the associated mutex. In a worst case scenario, the mutex might not even be mapped in its memory space.

My idea for overcoming this issue is to have pthread_cond_broadcast only directly wake one waiter, and have that waiter perform the requeue operation when it wakes up, since it does have the needed pointer to the mutex.

Naturally there are a lot of ugly race conditions to consider if I pursue this approach, but if they can be overcome, are there any other reasons such an implementation would be invalid or undesirable? One potential issue I can think of that might not be able to be overcome is the race where the waiter (a separate process) responsible for the requeue gets killed before it can act, but it might be possible to overcome even this by putting the condvar futex in the robust mutex list so that the kernel performs a wake on it when the process dies.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

毁我热情 2024-12-13 18:54:05

可能有属于多个地址空间的等待者,每个地址空间都将与 futex 关联的互斥量映射到内存中的不同地址。我不确定当重新排队点可能未映射到所有服务员中的同一地址时,FUTEX_REQUEUE是否可以安全使用;如果确实如此,那么这不是问题。

还有其他问题无法被强大的 futexes 检测到;例如,如果您选择的服务员正忙于信号处理程序,您可能会等待任意长的时间。 [正如评论中所讨论的,这些不是问题]

请注意,对于强大的 futexes,您必须设置futex & 的值0x3FFFFFFF为被唤醒线程的TID;如果您想要唤醒,您还必须设置位FUTEX_WAITERS。这意味着您必须选择从广播线程中唤醒哪个线程,否则您将无法在FUTEX_WAKE之后立即处理线程死亡。您还需要处理唤醒线程将其 TID 写入状态变量之前线程立即终止的可能性 - 可能有一个“待处理的主”字段,在健壮的互斥体中注册系统将是一个好主意。

那么,只要您确保仔细处理线程退出问题,我认为这没有理由行不通。也就是说,最好在内核中简单地定义 FUTEX_WAIT 的扩展,该扩展将重新排队点和比较值作为参数,并让内核以简单、无竞争的方式处理此问题。

There may be waiters belonging to multiple address spaces, each of which has mapped the mutex associated with the futex at a different address in memory. I'm not sure if FUTEX_REQUEUE is safe to use when the requeue point may not be mapped at the same address in all waiters; if it does then this isn't a problem.

There are other problems that won't be detected by robust futexes; for example, if your chosen waiter is busy in a signal handler, you could be kept waiting an arbitrarily long time. [As discussed in the comments, these are not an issue]

Note that with robust futexes, you must set the value of the futex & 0x3FFFFFFF to be the TID of the thread to be woken up; you must also set bit FUTEX_WAITERS on if you want a wakeup. This means that you must choose which thread to awaken from the broadcasting thread, or you will be unable to deal with thread death immediately after the FUTEX_WAKE. You'll also need to deal with the possibility of the thread dying immediately before the waker thread writes its TID into the state variable - perhaps having a 'pending master' field that is also registered in the robust mutex system would be a good idea.

I see no reason why this can't work, then, as long as you make sure to deal with the thread exit issues carefully. That said, it may be best to simply define in the kernel an extension to FUTEX_WAIT that takes a requeue point and comparison value as an argument, and let the kernel handle this in a simple, race-free manner.

晒暮凉 2024-12-13 18:54:05

我只是不明白为什么您认为相应的互斥体可能不知道。明确指出

使用多个互斥体进行并发的效果
pthread_cond_timedwait() 或 pthread_cond_wait() 操作相同
条件变量
未定义;也就是说,条件变量绑定到
当线程等待条件变量时一个唯一的互斥体,并且这个
(动态的)
当等待返回时,绑定将结束。

因此,即使对于进程共享互斥体和条件,这也必须保持,并且任何用户空间进程必须始终映射与该条件关联的相同且唯一的互斥体。

我不支持允许用户同时将不同的互斥锁与一个条件相关联。

I just don't see why you assume that the corresponding mutex might not be known. It is clearly stated

The effect of using more than one mutex for concurrent
pthread_cond_timedwait() or pthread_cond_wait() operations on the same
condition variable
is undefined; that is, a condition variable becomes bound to
a unique mutex when a thread waits on the condition variable, and this
(dynamic)
binding shall end when the wait returns.

So even for process shared mutexes and conditions this must hold, and any user space process must always have mapped the same and unique mutex that is associated to the condition.

Allowing users to associate different mutexes to a condition at the same time is nothing that I would support.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文