怪异的LTO行为与-ffast -Math

发布于 2025-02-11 03:00:17 字数 3710 浏览 3 评论 0原文

总而言之,

最近我遇到了一个关于LTO和-ffast-Math的奇怪问题,在这些问题中,我的“ POW”(在cmath)调用中,我的结果不一致,取决于-使用flto

环境:

$ g++ --version
g++ (GCC) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ll /lib64/libc.so.6
lrwxrwxrwx 1 root root 12 Sep  3  2019 /lib64/libc.so.6 -> libc-2.17.so

$ ll /lib64/libm.so.6
lrwxrwxrwx 1 root root 12 Sep  3  2019 /lib64/libm.so.6 -> libm-2.17.so

$ cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core) 

最小示例

代码

  • filest.hxx
#include <cstdint>
double Power10f(const int16_t power);
  • filex.cxx
#include "fixed.hxx"
#include <cmath>

double Power10f(const int16_t power)
{
    return pow(10.0, (double) power);
}
  • test.cxx
#include <iostream>
#include <cmath>
#include <iomanip>
#include <cstdint>
#include "fixed.hxx"

int main(int argc, char** argv)
{
    if (argc >= 3) {
        int64_t value = (int64_t)atoi(argv[1]);
        int16_t power = (int16_t)atoi(argv[2]);
        double x = Power10f(power);
        std::cout.precision(17);
        std::cout << std::scientific << x << std::endl;
        std::cout << std::scientific << (double)value * x << std::endl;
        return 0;   
    }
    return 1;
}

compile&amp; 使用-ffast-Math运行将

其编译,并且没有/没有-flto

  • -flto给出不同的结果,最终将调用__ pow_finite __ pow_finite < /code>版本,并给出“准确”结果:
$ g++ -O3 -DNDEBUG -ffast-math -std=c++17 -flto  -o fixed.cxx.o -c fixed.cxx
$ g++ -O3 -DNDEBUG   -o fdtest fixed.cxx.o test.cxx
$ ./fdtest 81 20
1.00000000000000000e+20
8.10000000000000000e+21
$ objdump -DC fdtest > fdtest.dump
$ cat fdtest.dump
...
0000000000400930 <Power10f(short)>:
  400930:       0f bf ff                movswl %di,%edi
  400933:       66 0f ef c9             pxor   %xmm1,%xmm1
  400937:       f2 0f 10 05 99 00 00    movsd  0x99(%rip),%xmm0        # 4009d8 <_IO_stdin_used+0x8>
  40093e:       00 
  40093f:       f2 0f 2a cf             cvtsi2sd %edi,%xmm1
  400943:       e9 d8 fd ff ff          jmpq   400720 <__pow_finite@plt>
  400948:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40094f:       00
...
  • 没有-flto最终调用__ exp_finite(作为-ffast-math启用的优化如果我猜对了),并给出“不准确”的结果。
$ g++ -O3 -DNDEBUG -ffast-math -std=c++17  -o fixed.cxx.o -c fixed.cxx
$ g++ -O3 -DNDEBUG   -o fdtest fixed.cxx.o test.cxx
$ ./fdtest 81 20
1.00000000000000786e+20
8.10000000000006396e+21
$ objdump -DC fdtest > fdtest.dump
$ cat fdtest.dump
...
0000000000400930 <Power10f(short)>:
  400930:       0f bf ff                movswl %di,%edi
  400933:       66 0f ef c0             pxor   %xmm0,%xmm0
  400937:       f2 0f 2a c7             cvtsi2sd %edi,%xmm0
  40093b:       f2 0f 59 05 95 00 00    mulsd  0x95(%rip),%xmm0        # 4009d8 <_IO_stdin_used+0x8>
  400942:       00 
  400943:       e9 88 fd ff ff          jmpq   4006d0 <__exp_finite@plt>
  400948:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40094f:       00
...

问题

是上述示例预期行为,还是导致这种意外行为的我的代码有问题?

在其他一些平台上也可以观察到

相同的结果(例如,具有G ++ 12.1和GLIBC 2.35的Archlinux)。

Summary

Recently I encountered a weird issue regarding LTO and -ffast-math where I got inconsistent result for my "pow" ( in cmath ) calls depending on whether -flto is used.

Environment:

$ g++ --version
g++ (GCC) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ll /lib64/libc.so.6
lrwxrwxrwx 1 root root 12 Sep  3  2019 /lib64/libc.so.6 -> libc-2.17.so

$ ll /lib64/libm.so.6
lrwxrwxrwx 1 root root 12 Sep  3  2019 /lib64/libm.so.6 -> libm-2.17.so

$ cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core) 

Minimal Example

Code

  • fixed.hxx
#include <cstdint>
double Power10f(const int16_t power);
  • fixed.cxx
#include "fixed.hxx"
#include <cmath>

double Power10f(const int16_t power)
{
    return pow(10.0, (double) power);
}
  • test.cxx
#include <iostream>
#include <cmath>
#include <iomanip>
#include <cstdint>
#include "fixed.hxx"

int main(int argc, char** argv)
{
    if (argc >= 3) {
        int64_t value = (int64_t)atoi(argv[1]);
        int16_t power = (int16_t)atoi(argv[2]);
        double x = Power10f(power);
        std::cout.precision(17);
        std::cout << std::scientific << x << std::endl;
        std::cout << std::scientific << (double)value * x << std::endl;
        return 0;   
    }
    return 1;
}

Compile & Run

Compile it with -ffast-math and with/without -flto gives different results

  • With -flto will eventually call the __pow_finite version and gives the an "accurate" result:
$ g++ -O3 -DNDEBUG -ffast-math -std=c++17 -flto  -o fixed.cxx.o -c fixed.cxx
$ g++ -O3 -DNDEBUG   -o fdtest fixed.cxx.o test.cxx
$ ./fdtest 81 20
1.00000000000000000e+20
8.10000000000000000e+21
$ objdump -DC fdtest > fdtest.dump
$ cat fdtest.dump
...
0000000000400930 <Power10f(short)>:
  400930:       0f bf ff                movswl %di,%edi
  400933:       66 0f ef c9             pxor   %xmm1,%xmm1
  400937:       f2 0f 10 05 99 00 00    movsd  0x99(%rip),%xmm0        # 4009d8 <_IO_stdin_used+0x8>
  40093e:       00 
  40093f:       f2 0f 2a cf             cvtsi2sd %edi,%xmm1
  400943:       e9 d8 fd ff ff          jmpq   400720 <__pow_finite@plt>
  400948:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40094f:       00
...
  • Without -flto eventually calls __exp_finite ( as an optimization enabled by -ffast-math if I guess right ), and gives an "inaccurate" result.
$ g++ -O3 -DNDEBUG -ffast-math -std=c++17  -o fixed.cxx.o -c fixed.cxx
$ g++ -O3 -DNDEBUG   -o fdtest fixed.cxx.o test.cxx
$ ./fdtest 81 20
1.00000000000000786e+20
8.10000000000006396e+21
$ objdump -DC fdtest > fdtest.dump
$ cat fdtest.dump
...
0000000000400930 <Power10f(short)>:
  400930:       0f bf ff                movswl %di,%edi
  400933:       66 0f ef c0             pxor   %xmm0,%xmm0
  400937:       f2 0f 2a c7             cvtsi2sd %edi,%xmm0
  40093b:       f2 0f 59 05 95 00 00    mulsd  0x95(%rip),%xmm0        # 4009d8 <_IO_stdin_used+0x8>
  400942:       00 
  400943:       e9 88 fd ff ff          jmpq   4006d0 <__exp_finite@plt>
  400948:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40094f:       00
...

Question

Is the above example expected behavior or is there something wrong with my code that caused this unexpected behavior?

Update

The same result can also be observed on some other platforms ( e.g. ArchLinux with g++ 12.1 and glibc 2.35 ).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

ζ澈沫 2025-02-18 03:00:17

男人GCC:

要使用链接时间优化器,-flto和优化选项,应在编译时和最终链接期间指定。建议您编译所有参与相同链接的文件,并在链接时间指定这些选项。例如:

  gcc -c -o2 -flto foo.c
              gcc -c -o2 -flto bar.c
              gcc -o myprog -flto -o2 foo.o bar.o
 

man gcc:

To use the link-time optimizer, -flto and optimization options should be specified at compile time and during the final link. It is recommended that you compile all the files participating in the same link with the same options and also specify those options at link time. For example:

              gcc -c -O2 -flto foo.c
              gcc -c -O2 -flto bar.c
              gcc -o myprog -flto -O2 foo.o bar.o
擦肩而过的背影 2025-02-18 03:00:17

-FFAST-MAT​​H允许编译器的权限出于任何原因而不一致。由于选择了不同的优化策略,因此在函数中修改该函数中的概念无关代码也可以轻松地导致返回不同的结果。 -flto对如何/何时进行优化进行了很多更改,因此可以实现很多空间。

如果您关心数字精度,数字一致性或数字通常不使用-FFAST-MAT​​H。它执行的转换通常可以作为程序员使用,如果您自己进行,则可以依靠它们的一致性。

-ffast-math gives the compiler permission to be inconsistent for whatever reasons it wants. Modifying even notionally unrelated code in the function could easily lead to pow returning different results thanks to different optimization strategies being chosen. And -flto changes quite a bit about how/when optimization is done, so there's a lot of room for that to happen.

If you care about numerical precision, or numeric consistency, or numerics in general, do not use -ffast-math. The transformations it performs are generally available to you as a programmer, and if you do them yourself, you can rely on their consistency.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文