怪异的LTO行为与-ffast -Math

发布于 2025-02-11 03:00:17 字数 3710 浏览 3 评论 0原文

总而言之，

最近我遇到了一个关于LTO和-ffast-Math的奇怪问题，在这些问题中，我的“ POW”（在cmath）调用中，我的结果不一致，取决于-使用flto。

环境：

$ g++ --version
g++ (GCC) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ll /lib64/libc.so.6
lrwxrwxrwx 1 root root 12 Sep  3  2019 /lib64/libc.so.6 -> libc-2.17.so

$ ll /lib64/libm.so.6
lrwxrwxrwx 1 root root 12 Sep  3  2019 /lib64/libm.so.6 -> libm-2.17.so

$ cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core)

最小示例

代码

filest.hxx

#include <cstdint>
double Power10f(const int16_t power);

filex.cxx

#include "fixed.hxx"
#include <cmath>

double Power10f(const int16_t power)
{
    return pow(10.0, (double) power);
}

test.cxx

#include <iostream>
#include <cmath>
#include <iomanip>
#include <cstdint>
#include "fixed.hxx"

int main(int argc, char** argv)
{
    if (argc >= 3) {
        int64_t value = (int64_t)atoi(argv[1]);
        int16_t power = (int16_t)atoi(argv[2]);
        double x = Power10f(power);
        std::cout.precision(17);
        std::cout << std::scientific << x << std::endl;
        std::cout << std::scientific << (double)value * x << std::endl;
        return 0;   
    }
    return 1;
}

compile＆amp; 使用`-ffast-Math`运行将

其编译，并且没有/没有-flto

用-flto给出不同的结果，最终将调用__ pow_finite __ pow_finite < /code>版本，并给出“准确”结果：

$ g++ -O3 -DNDEBUG -ffast-math -std=c++17 -flto  -o fixed.cxx.o -c fixed.cxx
$ g++ -O3 -DNDEBUG   -o fdtest fixed.cxx.o test.cxx
$ ./fdtest 81 20
1.00000000000000000e+20
8.10000000000000000e+21
$ objdump -DC fdtest > fdtest.dump
$ cat fdtest.dump
...
0000000000400930 <Power10f(short)>:
  400930:       0f bf ff                movswl %di,%edi
  400933:       66 0f ef c9             pxor   %xmm1,%xmm1
  400937:       f2 0f 10 05 99 00 00    movsd  0x99(%rip),%xmm0        # 4009d8 <_IO_stdin_used+0x8>
  40093e:       00 
  40093f:       f2 0f 2a cf             cvtsi2sd %edi,%xmm1
  400943:       e9 d8 fd ff ff          jmpq   400720 <__pow_finite@plt>
  400948:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40094f:       00
...

没有-flto最终调用__ exp_finite（作为-ffast-math启用的优化如果我猜对了），并给出“不准确”的结果。

$ g++ -O3 -DNDEBUG -ffast-math -std=c++17  -o fixed.cxx.o -c fixed.cxx
$ g++ -O3 -DNDEBUG   -o fdtest fixed.cxx.o test.cxx
$ ./fdtest 81 20
1.00000000000000786e+20
8.10000000000006396e+21
$ objdump -DC fdtest > fdtest.dump
$ cat fdtest.dump
...
0000000000400930 <Power10f(short)>:
  400930:       0f bf ff                movswl %di,%edi
  400933:       66 0f ef c0             pxor   %xmm0,%xmm0
  400937:       f2 0f 2a c7             cvtsi2sd %edi,%xmm0
  40093b:       f2 0f 59 05 95 00 00    mulsd  0x95(%rip),%xmm0        # 4009d8 <_IO_stdin_used+0x8>
  400942:       00 
  400943:       e9 88 fd ff ff          jmpq   4006d0 <__exp_finite@plt>
  400948:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40094f:       00
...

问题

是上述示例预期行为，还是导致这种意外行为的我的代码有问题？

在其他一些平台上也可以观察到

相同的结果（例如，具有G ++ 12.1和GLIBC 2.35的Archlinux）。

原文

Summary

Recently I encountered a weird issue regarding LTO and -ffast-math where I got inconsistent result for my "pow" ( in cmath ) calls depending on whether -flto is used.

Environment:

$ g++ --version
g++ (GCC) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ ll /lib64/libc.so.6
lrwxrwxrwx 1 root root 12 Sep  3  2019 /lib64/libc.so.6 -> libc-2.17.so

$ ll /lib64/libm.so.6
lrwxrwxrwx 1 root root 12 Sep  3  2019 /lib64/libm.so.6 -> libm-2.17.so

$ cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core)

Minimal Example

Code

fixed.hxx

#include <cstdint>
double Power10f(const int16_t power);

fixed.cxx

#include "fixed.hxx"
#include <cmath>

double Power10f(const int16_t power)
{
    return pow(10.0, (double) power);
}

test.cxx

#include <iostream>
#include <cmath>
#include <iomanip>
#include <cstdint>
#include "fixed.hxx"

int main(int argc, char** argv)
{
    if (argc >= 3) {
        int64_t value = (int64_t)atoi(argv[1]);
        int16_t power = (int16_t)atoi(argv[2]);
        double x = Power10f(power);
        std::cout.precision(17);
        std::cout << std::scientific << x << std::endl;
        std::cout << std::scientific << (double)value * x << std::endl;
        return 0;   
    }
    return 1;
}

Compile & Run

Compile it with -ffast-math and with/without -flto gives different results

With -flto will eventually call the __pow_finite version and gives the an "accurate" result:

$ g++ -O3 -DNDEBUG -ffast-math -std=c++17 -flto  -o fixed.cxx.o -c fixed.cxx
$ g++ -O3 -DNDEBUG   -o fdtest fixed.cxx.o test.cxx
$ ./fdtest 81 20
1.00000000000000000e+20
8.10000000000000000e+21
$ objdump -DC fdtest > fdtest.dump
$ cat fdtest.dump
...
0000000000400930 <Power10f(short)>:
  400930:       0f bf ff                movswl %di,%edi
  400933:       66 0f ef c9             pxor   %xmm1,%xmm1
  400937:       f2 0f 10 05 99 00 00    movsd  0x99(%rip),%xmm0        # 4009d8 <_IO_stdin_used+0x8>
  40093e:       00 
  40093f:       f2 0f 2a cf             cvtsi2sd %edi,%xmm1
  400943:       e9 d8 fd ff ff          jmpq   400720 <__pow_finite@plt>
  400948:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40094f:       00
...

Without -flto eventually calls __exp_finite ( as an optimization enabled by -ffast-math if I guess right ), and gives an "inaccurate" result.

$ g++ -O3 -DNDEBUG -ffast-math -std=c++17  -o fixed.cxx.o -c fixed.cxx
$ g++ -O3 -DNDEBUG   -o fdtest fixed.cxx.o test.cxx
$ ./fdtest 81 20
1.00000000000000786e+20
8.10000000000006396e+21
$ objdump -DC fdtest > fdtest.dump
$ cat fdtest.dump
...
0000000000400930 <Power10f(short)>:
  400930:       0f bf ff                movswl %di,%edi
  400933:       66 0f ef c0             pxor   %xmm0,%xmm0
  400937:       f2 0f 2a c7             cvtsi2sd %edi,%xmm0
  40093b:       f2 0f 59 05 95 00 00    mulsd  0x95(%rip),%xmm0        # 4009d8 <_IO_stdin_used+0x8>
  400942:       00 
  400943:       e9 88 fd ff ff          jmpq   4006d0 <__exp_finite@plt>
  400948:       0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  40094f:       00
...

Question

Is the above example expected behavior or is there something wrong with my code that caused this unexpected behavior?

Update

The same result can also be observed on some other platforms ( e.g. ArchLinux with g++ 12.1 and glibc 2.35 ).

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ζ澈沫 2025-02-18 03:00:17

男人GCC：

要使用链接时间优化器，-flto和优化选项，应在编译时和最终链接期间指定。建议您编译所有参与相同链接的文件，并在链接时间指定这些选项。例如：
  gcc -c -o2 -flto foo.c
              gcc -c -o2 -flto bar.c
              gcc -o myprog -flto -o2 foo.o bar.o
 

man gcc:

To use the link-time optimizer, -flto and optimization options should be specified at compile time and during the final link. It is recommended that you compile all the files participating in the same link with the same options and also specify those options at link time. For example:
              gcc -c -O2 -flto foo.c
              gcc -c -O2 -flto bar.c
              gcc -o myprog -flto -O2 foo.o bar.o

回复收藏 0 原文