如何建议 gcc 编译器更可能的分支
示例:
if (almost_always_false_condition) {
// do something
}
有没有办法建议编译器在 99% 的情况下将为 false。 条件计算需要约 60 个周期进行检查,并且编译器本身无法在编译时计算。
(海湾合作委员会4.3)
Example:
if (almost_always_false_condition) {
// do something
}
Is there a way to suggest compiler that in 99% condition will be false.
The condition calculation takes ~60 cycles to be checked, and it cannot be calculated at compile time by compiler itself.
(gcc 4.3)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果你想向 GCC 建议某个条件可能有给定值作为安排代码流的提示,你应该使用 __builtin_expect( ) :
但是,听起来你想找到一种方法以避免评估条件,而
__builtin_expect( )
不会这样做。有没有一种方法可以快速近似条件,并且仅在近似为真时才进行全面检查:您能告诉我们更多有关条件的信息吗?
If you want to suggest to GCC that a condition is likely to have a given value as a hint for arranging code flow, you should use
__builtin_expect( )
:However, it sounds like you want to find a way to avoid evaluating the condition, which
__builtin_expect( )
won't do. Is there a way that you can quickly approximate the condition, and only do the full check when the approximation is true:Can you tell us more about what the condition is?
如果单次运行期间结果可能有所不同,您可以使用布尔运算符的惰性求值将您的条件分为便宜部分和昂贵部分,并首先运行便宜部分。
由于计算
a == 5
比somethingexppressive()
更便宜,并且如果它几乎总是false
,您应该首先运行它,这样可以避免评估somethingexpcious
子句。另一方面,如果程序运行的结果是恒定的,则可以通过将计算结果存储在静态或全局变量中来优化它。
通过这种方式,您可以将
if
的成本降低为简单的内存查找。如果条件仅因响应另一个过程中的操作而变化,则可以从该过程更新全局:
如果
someotherfunction
的调用频率比appendtolist
的调用频率高,这非常有用功能。If the result might vary during a single run, you may be able to use lazy evaluation of boolean operators to split your condition into a cheap part and an expensive part, and run the cheap part first.
Since calculating
a == 5
is cheaper thansomethingexpensive()
, and if it is almost alwaysfalse
you should run it first, which avoids evaluating thesomethingexpensive
clause.If on the other hand the result is constant for a run of the program, you can optimise it by storing the result of the calculation in a static or global variable.
This way you have reduced the cost of the
if
to a simple memory lookup.If the condition only changes in response to an action in another procedure, you can update the global from that procedure:
This is useful if
someotherfunction
is called lots more often than theappendtolist
function.首先,在
else
子句或程序的其他地方花费了多少个周期?如果您进行简介或拍摄stackshots,您是否至少花费了你用 10% 的时间来做那个测试?如果没有,您可能应该首先考虑更大的问题。其次,如果您在该测试中花费超过 10% 的时间,您应该查看是否可以调整算法以使决策点更接近 50-50 的概率。 50-50 决策点在执行时会产生 1 位信息,而 99-1 决策点仅产生约 0.07 位*(即它不会告诉您太多信息,因此 CPU 周期的使用效率较低。 )这种现象的一个例子是将线性搜索与二分搜索进行比较。
*如果您有一个二元决策点,并且结果的概率为
a
和b
,则以位为单位的信息产量(熵)为-(a*log (a) + b*log(b))/log(2)
。First of all, how many cycles are spent in the
else
clause, or elsewhere in the program? If you profile or take stackshots, are you spending at least 10% of your time in that test? If not, there probably are bigger problems you should look at first.Second, if you are spending >10% of time in that test, you should look to see if the algorithm can be adjusted to have decision points closer to 50-50 probability. A 50-50 decision point yields 1 bit of information when it is executed, while a 99-1 decision point only yields about .07 bits.* (i.e. it doesn't tell you much, so it is inefficient use of CPU cycles.) An example of this phenomenon is if you compare linear search with binary search.
*If you have a binary decision point and the probabilities of the outcomes are
a
andb
, the information yield (entropy) in bits is-(a*log(a) + b*log(b))/log(2)
.文档表明 gcc 确实(或可以)进行配置文件驱动的优化。这不是我曾经尝试过用 gcc 做的事情,所以无法提供任何进一步的建议,可能值得你去谷歌一下。
The documentation suggests that gcc does (or can) do profile-driven optimization. It's not something I've ever tried to do with gcc, so can't provide any further advice, it might be worth your while hitting Google.
在我看来,执行此类优化的最佳方法是使用 -fprofile-generate 和 -fprofile-use 选项。这需要有代表性的用例基础来收集有关什么是可能的、什么不是的信息,但测试可以用于此目的。另一方面,代码没有用丑陋的、不可移植的指令来修饰。
请参阅 https://gcc.gnu。 org/onlinedocs/gcc-4.3.6/gcc/Optimize-Options.html#Optimize-Options 有关这两个选项的更多信息。
In my opinion, the best way to perform this kind of optimization is to use the
-fprofile-generate
and-fprofile-use
options. This requires a base of representative use cases to collect information about what is likely and what is not, but the tests may be used for this purpose. On the other side, the code is not decorated with ugly, non portable directives.See https://gcc.gnu.org/onlinedocs/gcc-4.3.6/gcc/Optimize-Options.html#Optimize-Options for more information about these two options.