是否有使用可能/不可能提示的性能测试结果?

发布于 2024-12-07 12:09:29 字数 151 浏览 8 评论 0 原文

gcc 具有可能/不可能的提示,可帮助编译器生成具有更好分支预测的机器代码。

是否有任何数据表明正确使用或未能使用这些提示如何影响某些真实系统上真实代码的性能?

gcc features likely/unlikely hints that help the compiler to generate machine code with better branch prediction.

Is there any data on how proper usage or failure to use those hints affects performance of real code on some real systems?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

滿滿的愛 2024-12-14 12:09:29

问题有所不同,但是 Peter Cordes 在 href="https://stackoverflow.com/q/1851299/15416">这个问题给出了明确的提示;)。现代 CPU 忽略静态提示并使用动态分支预测。

The question differs, but Peter Cordes's answer on this question gives a clear hint ;) . Modern CPU's ignore static hints and use dynamic branch prediction.

桃扇骨 2024-12-14 12:09:29

我不知道对此类特定提示有任何彻底的分析。无论如何,它都非常特定于 CPU。一般来说,如果您确定可能性(例如,> 90%),那么添加此类注释可能是值得的,尽管改进会因特定用例而有很大差异。

现代桌面 CPU 往往具有非常好的分支预测。如果您的代码无论如何都位于热路径上,动态分支预测器将很快发现该分支本身存在偏差。此类提示主要有助于静态预测器在没有动态分支信息可用时启动。

在 x86 上,静态预测器预测要采用的前向分支和要采用的后向分支(因为它们通常表示循环)。因此,编译器将调整静态代码布局以匹配预测。 (这也可能有助于将热路径放在相邻的缓存行上,这可能会进一步有所帮助。)

在 PPC 上,某些跳转指令具有预测其可能性的位。我不知道编译器是否也会重新排列代码。

我不知道ARM CPU如何预测分支。作为低功耗设备,它可能具有不太复杂的分支预测,而静态预测可能会产生更大的影响。

I don't know of any thorough analysis of such particular hints. In any case, it would be extremely CPU-specific. In general, if you are sure about the likelyhood (e.g., > 90%) then it is probably worthwhile to add such annotations, although improvements will vary a lot with the specific use case.

Modern Desktop CPUs tend to have very good branch prediction. If your code is on a hot path anyway, the dynamic branch predictor will quickly figure out that the branch is biased on its own. Such hints are mainly useful to help the static predictor which kicks in if no dynamic branch information is available.

On x86, the static predictor predicts forward branches not to be taken and backward branches to be taken (since they usually indicate loops). The compiler will therefore adjust static code layout to match the predictions. (This may also help putting the hot path on adjacent cache lines, which may help further.)

On PPC, some jump instructions have a bit to predict their likelyhood. I don't know if the compiler will rearrange code, too.

I don't know how ARM CPUs predict branches. As a low-power device it may have less sophisticated branch prediction and static prediction could have more impact.

葬花如无物 2024-12-14 12:09:29

可能/不可能提示的工作方式是使用程序员认为通常正确的分支代码预加载 ICache。分支预测器本质上依赖于有限的历史数据,仅在循环(或小型代码库)中有效,并且就分支性能而言,循环并不总是问题——例如,在实时模拟或游戏中,其中需要以非常高的速率处理大量对象的大量模拟/游戏逻辑。分支预测器在这种情况下无法有效运行,这对于 sim 开发人员来说是一个严重的性能问题。这种逻辑每帧实际上可以包含数千个不同的、非重复的条件,从而完全禁用分支预测器有效运行的能力。

在回答最初的问题时,编译器在生成预加载 Icache 的代码时倾向于假设条件为假。您应该检查代码中的程序集输出来验证这一点,然后如果您不想构建代码以适应特定的处理器体系结构,则可以为要以高性能方式预加载的条件编写宏。

一些研究估计,现代处理器上的现代游戏引擎将 60-80% 的时间花在缓存未命中上,而分支错误预测大约占这些未命中的 15%。为了适应现代游戏引擎,分支预测器需要整个游戏逻辑框架的历史数据——每个管道可能涉及数MB的数据。

Likely/Unlikely hints work by preloading the ICache with the branch code that is perceived as being generally correct by the programmer. Branch predictors are, by nature of relying upon limited historical data, effective in loops (or small codebases) only, and loops are not always the issue, with regards to branching performance -- for example, in a real-time simulation, or game, where large amounts of sim/game logic need to be processed for large numbers of objects, at a very high rate. Branch predictors cannot operate effectively in this context, and this is a serious performance concern for sim developers. This logic can consist of literally thousands of different, non-repeating conditionals each frame, completely disabling the ability of a branch predictor to operate effectively.

In answer to the original question, compilers tend to assume a conditional will be false, when generating the code to preload the Icache. You should check the assembly output in your code to verify that, and then you might be able to author a macro for conditionals you want to preload in a performant way, if you don't want to structure your code to fit a particular processor architecture.

Some studies have estimated that modern game engines, on modern processors, spend 60-80% of their time on cache misses, and that branch mis-predictions are approximately 15% of those misses. In order to accommodate a modern game engine, a branch predictor would need historical data for the entire game logic frame -- probably involving several MB of data for each pipeline.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文