使用锁而不是原子内在函数的开销

发布于 2024-10-04 10:07:13 字数 407 浏览 0 评论 0原文

有谁知道已发布的锁定开销基准,而不是仅依赖于原子操作/内在函数(在多处理器系统上)?

我对一般结论特别感兴趣,例如“无论平台如何,锁定至少比内在函数慢 X 倍”之类的结论。 (这就是为什么我不能只对自己进行基准测试。)

快多少

#pragma omp atomic
++x;

我对直接比较感兴趣,例如使用而不是

#pragma omp critical
++x;

(假设 x 的所有其他更新也很关键)。

基本上,我需要这个来证明复杂的无锁实现的合理性,而不是简单的锁定实现,其中饥饿不是问题。传统观点认为,虽然锁定更简单,但非锁定实现具有很多优点。但我很难找到可靠的数据。

Does anyone know of published benchmarks of the overhead of locking instead of relying on certainly atomic operations/intrinsics (on a multiprocessor system) only?

I’m particularly interested in general conclusions, e.g. something like “regardless of the platform, locking is at least a factor X slower than intrinsics.” (That’s why I can’t just benchmark myself.)

I’m interested in direct comparisons, e.g. how much faster is using

#pragma omp atomic
++x;

instead of

#pragma omp critical
++x;

(assuming that every other update of x is also critical).

Basically, I need this to justify a complex lock-free implementation instead of a straightforward locking one where starvation isn’t an issue. Conventional wisdom is that while locking is simpler, non-locking implementations have tons of advantages. But I’m hard pressed to find reliable data.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

软糖 2024-10-11 10:07:13

我不知道具体的研究在哪里,但你不太可能在任何地方找到明确的锁更好的答案。这在很大程度上取决于您如何使用它们、它们受到多少争用以及您使用这些原语的目的。如果您想做的只是增加数字,那么是的,您可能会发现原子原语比锁更快,但如果您想执行多字比较和交换,或对树结构进行复杂更新等,您可能会发现原子基元比锁更快。你会发现无锁代码不仅更加复杂且难以调试,而且与精心设计的基于锁的实现相比,其性能优势充其量也是不确定的,而且不值得大幅增加复杂性。坦斯塔夫。

I don't know where specific studies are, but you are unlikely to find a definitive locks-are-better answer anywhere. It depends very much on how you use them, how much contention they are subject to, and what you are using the primitives for. If all you want to do is increment numbers, then yes, you'll probably find atomic primitives to be faster than locks, but if you want to perform multi-word compare and swap, or complex updates to tree structures, etc., you'll find that the lock-free code is not only much more complex and difficult to debug, but that the performance advantages over a well-designed lock-based implementation are inconclusive at best, and hardly worth the substantial increase in complexity. TANSTAAFL.

高跟鞋的旋律 2024-10-11 10:07:13

我对一般情况特别感兴趣
结论,例如
“无论平台如何,锁定
至少慢 X 倍
内在函数。”

恐怕没有一个普遍的结论,因为这个问题涉及到架构师的原子指令设计、缓存布局和内存总线。 x86 和 MIPS 之间的这些可能非常不同。您可以对您可能使用的架构师进行基准测试,并进行比较。锁的设计可能会对基准测试产生很大影响,所以即使你找到了一份报告,你也不应该简单地相信这一点。

I’m particularly interested in general
conclusions, e.g. something like
“regardless of the platform, locking
is at least a factor X slower than
intrinsics.”

I'a afraid there are no general conclusion, because this issue is related to a architect's atomic instruction design, cache layout, and memory bus. These may very different between x86 and MIPS. You can make benchmark on the architects which you may use, and compare them. The lock's design may impact the benchmark a lot, so even you find a report, you shouldn't believe that simply.

尹雨沫 2024-10-11 10:07:13

MS 对 .NET 4 中新的并发集合类做了一些基准测试。

http://blogs.msdn.com/b/pfxteam/archive/2010/04/26/9997562.aspx

不是C/C++,但底层原理是一样的:使用平台CAS/互锁操作而不是锁定你能锁定的地方。

Cliff Click 在他的无锁哈希表上也有一些基准:
http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf

尽管如此,使用无锁方法与锁定的性能增益(或损失)在很大程度上取决于算法和确切的用例。

MS did some benchmarks for the new concurrent collection classes in .NET 4.

http://blogs.msdn.com/b/pfxteam/archive/2010/04/26/9997562.aspx

Not C/C++, but the underlying principles are the same: use platform CAS/interlocked operations instead of locking where you can.

Cliff Click also has some benchmarks on his lock-free hash table:
http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf

Nevertheless, the amount of performance gain - or loss, for that matter - of using lock-free methods vs locking depends highly on the algorithm and the exact use case.

二智少女猫性小仙女 2024-10-11 10:07:13

这类问题的答案非常复杂:无锁代码经常自旋,而不是在争用情况下等待,因此在高争用情况下,运行速度可能会比线程将自己阻塞在队列中时显着慢在锁后面。此外,无锁代码仍然会花费大量时间来发出内存屏障,因此在不同的硬件和不同的平台上可能会出现意想不到的性能特征。

因此,性能问题的旧备用答案再次浮出水面:根据您的情况衡量两者,然后选择速度更快的一个。

基本上,我需要这个来证明复杂的无锁实现的合理性,而不是简单的锁定实现,其中饥饿不是问题。

好吧,总而言之,除非此数据结构位于大规模多线程应用程序的绝对中心,否则我会采用基于锁的解决方案,并使用尽可能少的锁。这些错误会更容易被发现,这是值得的。在涉及线程的地方,我个人发现很难证明任何类型的复杂实现(无锁或其他方式)的合理性。

Answers to this sort of question are highly complex: lock-free code frequently spins rather than waiting under contention, and so under high contention may run significantly slower than it would if the threads just blocked themselves in a queue behind a lock. In addition, lock-free code can still spend a lot of time issuing memory barriers, and so might have unexpected performance characteristics on different hardware and on different platforms.

So, the old standby answer to performance questions rears its head again: measure both of them in your situation, and pick the faster one.

Basically, I need this to justify a complex lock-free implementation instead of a straightforward locking one where starvation isn’t an issue.

Well, all that said, unless this data structure is at the absolute center of your massive-scale multithreaded application, I would go with a lock-based solution, with as few locks as possible. The bugs will be so much easier to find that it will be worth it. Where threading is involved, I personally find it very difficult to justify any kind of complex implementation, lock-free or otherwise.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文