为什么 pthread 互斥体被认为“较慢”?比 futex?
为什么 POSIX 互斥体被认为比 futex 更重或更慢? pthread 互斥类型的开销来自哪里?我听说 pthread 互斥体基于 futexes,并且在无争议时,不要对内核进行任何调用。看来 pthread 互斥体只是 futex 的“包装器”。
开销是否仅仅在于函数包装器调用以及互斥函数“设置”futex 的需要(即,基本上是 pthread 互斥函数调用的堆栈设置)?或者 pthread 互斥体是否发生了一些额外的内存屏障步骤?
Why are POSIX mutexes considered heavier or slower than futexes? Where is the overhead coming from in the pthread mutex type? I've heard that pthread mutexes are based on futexes, and when uncontested, do not make any calls into the kernel. It seems then that a pthread mutex is merely a "wrapper" around a futex.
Is the overhead simply in the function-wrapper call and the need for the mutex function to "setup" the futex (i.e., basically the setup of the stack for the pthread mutex function call)? Or are there some extra memory barrier steps taking place with the pthread mutex?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
创建 Futex 是为了提高 pthread 互斥体的性能。 NPTL 使用 futexes,LinuxThreads 早于 futexes,我认为这是“较慢”考虑的来源。 NPTL 互斥体可能有一些额外的开销,但应该不会太多。
编辑:
实际的开销基本上包括:
因此,几个周期(典型情况)到几个周期+分支错误预测+额外的缓存未命中(最坏的情况)。
Futexes were created to improve the performance of pthread mutexes. NPTL uses futexes, LinuxThreads predated futexes, which I think is where the "slower" consideration comes. NPTL mutexes may have some additional overhead, but it shouldn't be much.
Edit:
The actual overhead basically consists on:
So, a few cycles (typical case) to a few cycles + a branch misprediction + an additional cache miss (very worst case).
对您问题的简短回答是,众所周知,futex 的实现尽可能高效,而 pthread 互斥体可能会也可能不会。至少,pthread 互斥体具有与确定互斥体类型相关的开销,而 futex 则没有。因此,futex 几乎总是至少与 pthread 互斥体一样高效,除非有人想出某种比 futex 更轻的结构,然后发布一个将其用作默认互斥体的 pthreads 实现。
The short answer to your question is that futexes are known to be implemented about as efficiently as possible, while a pthread mutex may or may not be. At minimum, a pthread mutex has overhead associated with determining the type of mutex and futexes do not. So a futex will almost always be at least as efficient as a pthread mutex, until and unless someone thinks up some structure lighter than a futex and then releases a pthreads implementation that uses that for its default mutex.
从技术上讲,pthread 互斥体并不比 futex 慢或快。 pthread 只是一个标准 API,因此它们的快慢取决于该 API 的实现。
特别是在 Linux 中,pthread 互斥体被实现为 futexes,因此速度很快。实际上,您不想使用 futex API 本身,因为它很难使用,在 glibc 中没有适当的包装函数,并且需要在汇编中进行编码,这将是不可移植的。对我们来说幸运的是,glibc 维护者已经在 pthread 互斥 API 的框架下为我们编写了所有这些代码。
现在,因为大多数操作系统没有实现 futexes,所以程序员通常所说的 pthread 互斥体是指从 pthread 互斥体的通常实现中获得的性能,即速度较慢。
因此,统计事实是,在大多数兼容 POSIX 的操作系统中,pthread 互斥体是在内核空间中实现的,并且比 futex 慢。在 Linux 中它们具有相同的性能。可能还有其他操作系统在用户空间中实现 pthread 互斥锁(在无竞争的情况下),因此具有更好的性能,但我目前只知道 Linux。
Technically speaking pthread mutexes are not slower or faster than futexes. pthread is just a standard API, so whether they are slow or fast depends on the implementation of that API.
Specifically in Linux pthread mutexes are implemented as futexes and are therefore fast. Actually, you don't want to use the futex API itself as it is very hard to use, does not have the appropriate wrapper functions in glibc and requires coding in assembly which would be non portable. Fortunately for us the glibc maintainers already coded all of this for us under the hood of the pthread mutex API.
Now, because most operating systems did not implement futexes then programmers usually mean by pthread mutex is the performance you get from usual implementation of pthread mutexes, which is, slower.
So it's a statistical fact that in most operating systems that are POSIX compliant the pthread mutex is implemented in kernel space and is slower than a futex. In Linux they have the same performance. It could be that there are other operating systems where pthread mutexes are implemented in user space (in the uncontended case) and therefore have better performance but I am only aware of Linux at this point.
因为它们尽可能地保留在用户空间中,这意味着它们需要更少的系统调用,这本质上更快,因为用户模式和内核模式之间的上下文切换成本很高。
我假设当您谈论 POSIX 线程时,您正在谈论内核线程。完全有可能拥有 POSIX 线程的完全用户空间实现,它不需要系统调用,但有自己的其他问题。
我的理解是,futex 位于内核 POSIX 线程和用户空间 POSIX 线程之间。
Because they stay in userspace as much as possible, which means they require fewer system calls, which is inherently faster because the context switch between user and kernel mode is expensive.
I assume you're talking about kernel threads when you talk about POSIX threads. It's entirely possible to have an entirely userspace implementation of POSIX threads which require no system calls but have other issues of their own.
My understanding is that a futex is halfway between a kernel POSIX thread and a userspace POSIX thread.
在 AMD64 上,futex 是 4 个字节,而 NPTL pthread_mutex_t 是 56 个字节!是的,有很大的开销。
On AMD64 a futex is 4 bytes, while a NPTL pthread_mutex_t is 56 bytes! Yes, there is a significant overhead.