优化编译器中的常量组合
我有一个包含许多小型内联函数的头文件。他们中的大多数碰巧拥有恒定的数据。由于这些函数对性能至关重要,因此它们处理常量的方式变得很重要。据我所知,有两种方法来引用常量:
1)在单独的源文件中定义它们,该文件稍后与应用程序链接。
2) 就地定义常量。
我会选择后一种方式,因为它更易于维护。但是,如果编译器不优化通过内联创建的数千个相等常量,则速度可能会更慢。
问题:
编译器会组合这些相等的常量吗?特别是,将使用以下哪种方法?
1) 跨编译单元的相等常量的组合。
2)跨链接模块(整个程序或库)组合相等常量
3) 将常量与碰巧具有相同位模式的任何静态常量数据组合起来,并满足整个编译单元或整个程序的对齐要求。
我使用现代编译器(GCC4.5)。
我不是汇编专家,因此我无法使用几个简单的测试来回答这个问题:)
编辑:
常量非常大(其中大多数至少 16 个字节),所以编译器无法将它们设为立即值。
编辑2:
代码示例
这个代码就地使用常量:
float_4 sign(float_4 a)
{
const __attribute__((aligned(16))) float mask[4] = { //I use a macro for this line
0x80000000, 0x80000000, 0x80000000, 0x80000000};
const int128 mask = load(mask);
return b_and(a, mask);
}
I have a header file containing a lot of small inline functions. Most of them happen to have constant data. Since these functions are performance critical, the way they handle constants becomes important. AFAIK there are two ways to refer to constants:
1) Define them in a separate source file that is later linked with the application.
2) Define the constants in-place.
I would choose the latter way because it's more maintainable. However, the it might be slower if the compiler doesn't optimize thousands of equal constants that are created by inlining.
The question:
Will the compiler combine these equal constants? In particular, which of the following methods will be utilized?
1) Combining of equal constants across the compilation unit.
2) Combining of equal constants across the linking module (whole program or library)
3) Combining the constants with any static constant data that happens to have the same bit pattern and fulfills the alignment requirements across the compilation unit or whole program.
I use a modern compiler (GCC4.5).
I'm not an expert in assembler, thus I couldn't answer this question myself using several simple tests :)
EDIT:
The constants are quite big (most of them at least 16 bytes), so the compiler can't make them immediate values.
EDIT2:
EXAMPLE of the code
This one uses the constant in-place:
float_4 sign(float_4 a)
{
const __attribute__((aligned(16))) float mask[4] = { //I use a macro for this line
0x80000000, 0x80000000, 0x80000000, 0x80000000};
const int128 mask = load(mask);
return b_and(a, mask);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
根据 GCC 以下选项的作用你想要:
According to the GCC the following option does what you want:
如果您在头文件中定义常量,如下所示:
也就是说,在编译翻译单元(.cc 源文件)时,编译器不仅可以看到常量声明,而且可以看到定义,那么编译器肯定会将其替换为即使没有启用优化,生成的代码中的值也是恒定的。
请注意
foo(int)
中它如何执行添加操作,如addl $10, %eax
,即 10 个常量被其值替换。另一方面,在 bar(int) 中,它首先执行 movl TWELVE(%rip), %eax 将 TWELVE 的值从内存加载到 eax 寄存器(地址将由链接器解析),然后添加addl -4(%rbp), %eax
。优化版本如下所示:
If you define constants in your header file like this:
That is, not only the constant declaration but the definition as well is visible to the compiler when compiling a translation unit (a .cc source file), then certainly the compiler replaces it with a constant value in the generated code even with no optimizations enabled.
Notice how in
foo(int)
it does the addition asaddl $10, %eax
, i.e. TEN constant is replaced with its value. Inbar(int)
, on the other hand, it first doesmovl TWELVE(%rip), %eax
to load the value of TWELVE from memory into eax register (the address will be resolved by the linker) and then does the additionaddl -4(%rbp), %eax
.An optimized version looks like this:
我认为你的问题没有通用的答案。我只给出了 C 的规则,C++ 的规则是不同的。
这在很大程度上取决于常量的类型。一个重要的类是“整数常量表达式”。这些可以在编译时确定并且特别地用作“整数枚举常量”的值。尽可能使用它
对于这样的常量,最好的事情通常应该发生:它们被实现为汇编器立即数。它们甚至没有存储位置,直接写入汇编器中,您的问题甚至没有意义。
对于其他数据类型,问题更加微妙。尝试强制执行不采用“const 限定变量”的地址。这可以通过
register
关键字来完成。可能会产生与上面相同的效果。
对于组合类型(
struct
、union
、数组),没有通用的答案或方法。我已经看到 gcc 能够完全优化小数组(10 个元素左右)。I don't think that there are general answers to your questions. I give one for C only, the rules for C++ are different.
This depends a lot on the types of your constants. An important class are "integer constant expressions". These can be determined at compile time and in particular be used as values of "integer enumeration constants". Use that whenever you may
For such constants the best thing should usually happen: they are realized as assembler immediates. They don't even have a storage location, are directly written into the assembler and your questions don't even make sense.
For other data types the question is more delicate. Try to enforce that no address of your "const qualified variables" is taken. This can be done with the
register
keyword.may have the same effect as above.
For composed types (
struct
,union
, arrays) there is no general answer or method. I have already seen that gcc is able to optimize small arrays (10 elements or so) completely.