为什么 pthread 互斥体被认为“较慢”?比 futex?

发布于 2024-11-15 12:55:54 字数 237 浏览 7 评论 0原文

为什么 POSIX 互斥体被认为比 futex 更重或更慢? pthread 互斥类型的开销来自哪里?我听说 pthread 互斥体基于 futexes,并且在无争议时,不要对内核进行任何调用。看来 pthread 互斥体只是 futex 的“包装器”。

开销是否仅仅在于函数包装器调用以及互斥函数“设置”futex 的需要(即,基本上是 pthread 互斥函数调用的堆栈设置)?或者 pthread 互斥体是否发生了一些额外的内存屏障步骤?

Why are POSIX mutexes considered heavier or slower than futexes? Where is the overhead coming from in the pthread mutex type? I've heard that pthread mutexes are based on futexes, and when uncontested, do not make any calls into the kernel. It seems then that a pthread mutex is merely a "wrapper" around a futex.

Is the overhead simply in the function-wrapper call and the need for the mutex function to "setup" the futex (i.e., basically the setup of the stack for the pthread mutex function call)? Or are there some extra memory barrier steps taking place with the pthread mutex?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

把昨日还给我 2024-11-22 12:55:54

创建 Futex 是为了提高 pthread 互斥体的性能。 NPTL 使用 futexes,LinuxThreads 早于 futexes,我认为这是“较慢”考虑的来源。 NPTL 互斥体可能有一些额外的开销,但应该不会太多。

编辑:
实际的开销基本上包括:

  • 为互斥体类型选择正确的算法(正常、递归、自适应、错误检查;正常、鲁棒、优先级继承、优先级保护),其中代码向编译器强烈暗示我们正在可能使用普通的互斥体(因此它应该将其传达给 CPU 的分支预测逻辑),
  • 并且如果我们设法获取互斥体的当前所有者,则写入通常应该很快,因为它驻留在同一高速缓存行中作为我们刚刚获取的实际锁,除非该锁存在严重竞争,并且在我们获取该锁和尝试写入所有者之间有其他 CPU 访问了该锁(正常互斥体不需要此写入,但错误检查需要该写入)和递归互斥体)。

因此,几个周期(典型情况)到几个周期+分支错误预测+额外的缓存未命中(最坏的情况)。

Futexes were created to improve the performance of pthread mutexes. NPTL uses futexes, LinuxThreads predated futexes, which I think is where the "slower" consideration comes. NPTL mutexes may have some additional overhead, but it shouldn't be much.

Edit:
The actual overhead basically consists on:

  • selecting the correct algorithm for the mutex type (normal, recursive, adaptive, error-checking; normal, robust, priority-inheritance, priority-protected), where the code heavily hints to the compiler that we are likely using a normal mutex (so it should convey that to the CPU's branch prediction logic),
  • and a write of the current owner of the mutex if we manage to take it which should normally be fast, since it resides in the same cache-line as the actual lock which we have just taken, unless the lock is heavily contended and some other CPU accessed the lock between the time we took it and when we attempted to write the owner (this write is unneeded for normal mutexes, but needed for error-checking and recursive mutexes).

So, a few cycles (typical case) to a few cycles + a branch misprediction + an additional cache miss (very worst case).

勿忘初心 2024-11-22 12:55:54

对您问题的简短回答是,众所周知,futex 的实现尽可能高效,而 pthread 互斥体可能会也可能不会。至少,pthread 互斥体具有与确定互斥体类型相关的开销,而 futex 则没有。因此,futex 几乎总是至少与 pthread 互斥体一样高效,除非有人想出某种比 futex 更轻的结构,然后发布一个将其用作默认互斥体的 pthreads 实现。

The short answer to your question is that futexes are known to be implemented about as efficiently as possible, while a pthread mutex may or may not be. At minimum, a pthread mutex has overhead associated with determining the type of mutex and futexes do not. So a futex will almost always be at least as efficient as a pthread mutex, until and unless someone thinks up some structure lighter than a futex and then releases a pthreads implementation that uses that for its default mutex.

述情 2024-11-22 12:55:54

从技术上讲,pthread 互斥体并不比 futex 慢或快。 pthread 只是一个标准 API,因此它们的快慢取决于该 API 的实现

特别是在 Linux 中,pthread 互斥体被实现为 futexes,因此速度很快。实际上,您不想使用 futex API 本身,因为它很难使用,在 glibc 中没有适当的包装函数,并且需要在汇编中进行编码,这将是不可移植的。对我们来说幸运的是,glibc 维护者已经在 pthread 互斥 API 的框架下为我们编写了所有这些代码。

现在,因为大多数操作系统没有实现 futexes,所以程序员通常所说的 pthread 互斥体是指从 pthread 互斥体的通常实现中获得的性能,即速度较慢。

因此,统计事实是,在大多数兼容 POSIX 的操作系统中,pthread 互斥体是在内核空间中实现的,并且比 futex 慢。在 Linux 中它们具有相同的性能。可能还有其他操作系统在用户空间中实现 pthread 互斥锁(在无竞争的情况下),因此具有更好的性能,但我目前只知道 Linux。

Technically speaking pthread mutexes are not slower or faster than futexes. pthread is just a standard API, so whether they are slow or fast depends on the implementation of that API.

Specifically in Linux pthread mutexes are implemented as futexes and are therefore fast. Actually, you don't want to use the futex API itself as it is very hard to use, does not have the appropriate wrapper functions in glibc and requires coding in assembly which would be non portable. Fortunately for us the glibc maintainers already coded all of this for us under the hood of the pthread mutex API.

Now, because most operating systems did not implement futexes then programmers usually mean by pthread mutex is the performance you get from usual implementation of pthread mutexes, which is, slower.

So it's a statistical fact that in most operating systems that are POSIX compliant the pthread mutex is implemented in kernel space and is slower than a futex. In Linux they have the same performance. It could be that there are other operating systems where pthread mutexes are implemented in user space (in the uncontended case) and therefore have better performance but I am only aware of Linux at this point.

小矜持 2024-11-22 12:55:54

因为它们尽可能地保留在用户空间中,这意味着它们需要更少的系统调用,这本质上更快,因为用户模式和内核模式之间的上下文切换成本很高。

我假设当您谈论 POSIX 线程时,您正在谈论内核线程。完全有可能拥有 POSIX 线程的完全用户空间实现,它不需要系统调用,但有自己的其他问题。

我的理解是,futex 位于内核 POSIX 线程和用户空间 POSIX 线程之间。

Because they stay in userspace as much as possible, which means they require fewer system calls, which is inherently faster because the context switch between user and kernel mode is expensive.

I assume you're talking about kernel threads when you talk about POSIX threads. It's entirely possible to have an entirely userspace implementation of POSIX threads which require no system calls but have other issues of their own.

My understanding is that a futex is halfway between a kernel POSIX thread and a userspace POSIX thread.

我的鱼塘能养鲲 2024-11-22 12:55:54

在 AMD64 上,futex 是 4 个字节,而 NPTL pthread_mutex_t 是 56 个字节!是的,有很大的开销。

On AMD64 a futex is 4 bytes, while a NPTL pthread_mutex_t is 56 bytes! Yes, there is a significant overhead.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文