当前位置：文江博客话题详情

优化编译器中的常量组合

发布于 2024-10-17 11:21:04 字数 873 浏览 4 评论 0原文

我有一个包含许多小型内联函数的头文件。他们中的大多数碰巧拥有恒定的数据。由于这些函数对性能至关重要，因此它们处理常量的方式变得很重要。据我所知，有两种方法来引用常量：

1）在单独的源文件中定义它们，该文件稍后与应用程序链接。

2) 就地定义常量。

我会选择后一种方式，因为它更易于维护。但是，如果编译器不优化通过内联创建的数千个相等常量，则速度可能会更慢。

问题：

编译器会组合这些相等的常量吗？特别是，将使用以下哪种方法？

1) 跨编译单元的相等常量的组合。
2）跨链接模块（整个程序或库）组合相等常量
3) 将常量与碰巧具有相同位模式的任何静态常量数据组合起来，并满足整个编译单元或整个程序的对齐要求。

我使用现代编译器（GCC4.5）。

我不是汇编专家，因此我无法使用几个简单的测试来回答这个问题:)

编辑：

常量非常大（其中大多数至少 16 个字节），所以编译器无法将它们设为立即值。

编辑2：

代码示例

这个代码就地使用常量：

float_4 sign(float_4 a)
{
    const __attribute__((aligned(16))) float mask[4] = { //I use a macro for this line
        0x80000000, 0x80000000, 0x80000000, 0x80000000};
    const int128 mask = load(mask);
    return b_and(a, mask);
}

原文

I have a header file containing a lot of small inline functions. Most of them happen to have constant data. Since these functions are performance critical, the way they handle constants becomes important. AFAIK there are two ways to refer to constants:

1) Define them in a separate source file that is later linked with the application.

2) Define the constants in-place.

I would choose the latter way because it's more maintainable. However, the it might be slower if the compiler doesn't optimize thousands of equal constants that are created by inlining.

The question:

Will the compiler combine these equal constants? In particular, which of the following methods will be utilized?

1) Combining of equal constants across the compilation unit.
2) Combining of equal constants across the linking module (whole program or library)
3) Combining the constants with any static constant data that happens to have the same bit pattern and fulfills the alignment requirements across the compilation unit or whole program.

I use a modern compiler (GCC4.5).

I'm not an expert in assembler, thus I couldn't answer this question myself using several simple tests :)

EDIT:

The constants are quite big (most of them at least 16 bytes), so the compiler can't make them immediate values.

EDIT2:

EXAMPLE of the code

This one uses the constant in-place:

float_4 sign(float_4 a)
{
    const __attribute__((aligned(16))) float mask[4] = { //I use a macro for this line
        0x80000000, 0x80000000, 0x80000000, 0x80000000};
    const int128 mask = load(mask);
    return b_and(a, mask);
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风蛊 2024-10-24 11:21:04

根据 GCC 以下选项的作用你想要：

-fmerge-常量
尝试跨编译单元合并相同的常量（字符串常量和浮点常量）。
如果汇编器和链接器支持，则此选项是优化编译的默认选项。使用 -fno-merge-constants 来抑制此行为。
在-O、-O2、-O3、-Os 级别启用。

回复收藏 0 原文

电影里的梦 2024-10-24 11:21:04

如果您在头文件中定义常量，如下所示：

int const TEN = 10;
// or
enum { ELEVEN = 11 };

也就是说，在编译翻译单元（.cc 源文件）时，编译器不仅可以看到常量声明，而且可以看到定义，那么编译器肯定会将其替换为即使没有启用优化，生成的代码中的值也是恒定的。

[max@truth test]$ cat test.cc
int const TEN = 10; // definition available
extern int const TWELVE; // only declaration

int foo(int x) { return x + TEN; }
int bar(int x) { return x + TWELVE; }

[max@truth test]$ g++ -S -o - test.cc | c++filt | egrep -v " *\."
foo(int):
    pushq   %rbp
    movq    %rsp, %rbp
    movl    %edi, -4(%rbp)
    movl    -4(%rbp), %eax
    addl    $10, %eax
    leave
    ret
bar(int):
    pushq   %rbp
    movq    %rsp, %rbp
    movl    %edi, -4(%rbp)
    movl    TWELVE(%rip), %eax
    addl    -4(%rbp), %eax
    leave
    ret
TEN:

请注意 foo(int) 中它如何执行添加操作，如 addl $10, %eax，即 10 个常量被其值替换。另一方面，在 bar(int) 中，它首先执行 movl TWELVE(%rip), %eax 将 TWELVE 的值从内存加载到 eax 寄存器（地址将由链接器解析），然后添加 addl -4(%rbp), %eax。

优化版本如下所示：

[max@truth test]$ g++ -O3 -S -o - test.cc | c++filt | egrep -v " *\."
foo(int):
    leal    10(%rdi), %eax
    ret
bar(int):
    movl    TWELVE(%rip), %eax
    addl    %edi, %eax
    ret

If you define constants in your header file like this:

int const TEN = 10;
// or
enum { ELEVEN = 11 };

That is, not only the constant declaration but the definition as well is visible to the compiler when compiling a translation unit (a .cc source file), then certainly the compiler replaces it with a constant value in the generated code even with no optimizations enabled.

[max@truth test]$ cat test.cc
int const TEN = 10; // definition available
extern int const TWELVE; // only declaration

int foo(int x) { return x + TEN; }
int bar(int x) { return x + TWELVE; }

[max@truth test]$ g++ -S -o - test.cc | c++filt | egrep -v " *\."
foo(int):
    pushq   %rbp
    movq    %rsp, %rbp
    movl    %edi, -4(%rbp)
    movl    -4(%rbp), %eax
    addl    $10, %eax
    leave
    ret
bar(int):
    pushq   %rbp
    movq    %rsp, %rbp
    movl    %edi, -4(%rbp)
    movl    TWELVE(%rip), %eax
    addl    -4(%rbp), %eax
    leave
    ret
TEN:

Notice how in foo(int) it does the addition as addl $10, %eax, i.e. TEN constant is replaced with its value. In bar(int), on the other hand, it first does movl TWELVE(%rip), %eax to load the value of TWELVE from memory into eax register (the address will be resolved by the linker) and then does the addition addl -4(%rbp), %eax.

An optimized version looks like this:

[max@truth test]$ g++ -O3 -S -o - test.cc | c++filt | egrep -v " *\."
foo(int):
    leal    10(%rdi), %eax
    ret
bar(int):
    movl    TWELVE(%rip), %eax
    addl    %edi, %eax
    ret

回复收藏 0 原文

东京女 2024-10-24 11:21:04

我认为你的问题没有通用的答案。我只给出了 C 的规则，C++ 的规则是不同的。

这在很大程度上取决于常量的类型。一个重要的类是“整数常量表达式”。这些可以在编译时确定并且特别地用作“整数枚举常量”的值。尽可能使用它

enum { myFavoriteDimension = 55/2 };

对于这样的常量，最好的事情通常应该发生：它们被实现为汇编器立即数。它们甚至没有存储位置，直接写入汇编器中，您的问题甚至没有意义。

对于其他数据类型，问题更加微妙。尝试强制执行不采用“const 限定变量”的地址。这可以通过 register 关键字来完成。

register double const something = 5.7;

可能会产生与上面相同的效果。

对于组合类型（struct、union、数组），没有通用的答案或方法。我已经看到 gcc 能够完全优化小数组（10 个元素左右）。

I don't think that there are general answers to your questions. I give one for C only, the rules for C++ are different.

This depends a lot on the types of your constants. An important class are "integer constant expressions". These can be determined at compile time and in particular be used as values of "integer enumeration constants". Use that whenever you may

enum { myFavoriteDimension = 55/2 };

For such constants the best thing should usually happen: they are realized as assembler immediates. They don't even have a storage location, are directly written into the assembler and your questions don't even make sense.

For other data types the question is more delicate. Try to enforce that no address of your "const qualified variables" is taken. This can be done with the register keyword.

register double const something = 5.7;

may have the same effect as above.

For composed types (struct, union, arrays) there is no general answer or method. I have already seen that gcc is able to optimize small arrays (10 elements or so) completely.

回复收藏 0 原文

~没有更多了~