double 和 ffast-math 的自动矢量化

发布于 2024-09-01 17:11:42 字数 123 浏览 13 评论 0 原文

为什么必须将 -ffast-math 与 g++ 一起使用,以使用 double 实现循环的矢量化?我不喜欢 -ffast-math 因为我不想失去精度。

Why is it mandatory to use -ffast-math with g++ to achieve the vectorization of loops using doubles? I don't like -ffast-math because I don't want to lose precision.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

场罚期间 2024-09-08 17:11:42

使用 -ffast-math 不一定会损失精度。它仅影响 NaN、Inf 等的处理以及操作的执行顺序。

如果您有一段特定的代码,您不希望 GCC 重新排序或简化计算,则可以使用 asm 语句将变量标记为正在使用。

例如,以下代码对 f 执行舍入操作。但是,两个 f += gf -= g 操作可能会被 gcc 优化:

static double moo(double f, double g)                                      
{                                                                          
    g *= 4503599627370496.0; // 2 ** 52                                    
    f += g;                                                                
    f -= g;                                                                
    return f;                                                            
}                                                                     

在 x86_64 上,您可以使用这个 asm 语句指示 GCC 不要执行该优化:

static double moo(double f, double g)                                      
{                                                                          
    g *= 4503599627370496.0; // 2 ** 52                                    
    f += g;                                                                
    __asm__("" : "+x" (f));
    f -= g;
    return f;
}

不幸的是,您将需要针对每个架构进行调整。在 PowerPC 上,使用 +f 而不是 +x

You don’t necessarily lose precision with -ffast-math. It only affects the handling of NaN, Inf etc. and the order in which operations are performed.

If you have a specific piece of code where you do not want GCC to reorder or simplify computations, you can mark variables as being used using an asm statement.

For instance, the following code performs a rounding operation on f. However, the two f += g and f -= g operations are likely to get optimised away by gcc:

static double moo(double f, double g)                                      
{                                                                          
    g *= 4503599627370496.0; // 2 ** 52                                    
    f += g;                                                                
    f -= g;                                                                
    return f;                                                            
}                                                                     

On x86_64, you can use this asm statement to instruct GCC not to perform that optimisation:

static double moo(double f, double g)                                      
{                                                                          
    g *= 4503599627370496.0; // 2 ** 52                                    
    f += g;                                                                
    __asm__("" : "+x" (f));
    f -= g;
    return f;
}

You will need to adapt this for each architecture, unfortunately. On PowerPC, use +f instead of +x.

青衫负雪 2024-09-08 17:11:42

很可能是因为矢量化意味着您可能会得到不同的结果,或者可能意味着您会错过浮点信号/异常。

如果您正在编译 32 位 x86,则 gcc 和 g++ 默认使用 x87 进行浮点数学计算,在 64 位上它们默认使用 SSE,但是 x87 可以并且将会为相同的计算生成不同的值,因此 g++ 不太可能如果不能保证获得相同的结果,除非您使用 -ffast-math 或它打开的一些标志,否则将考虑矢量化。

基本上,它归结为矢量化代码的浮点环境可能与非矢量化代码的浮点环境不同,有时以很重要的方式相同,如果差异对您来说并不重要,例如

-fno-math-errno -fno-trapping-math -fno-signaling-nans -fno-rounding-math

但首先查找这些选项并确保它们不会影响你的程序的正确性。 -ffinite-math-only 也可能有帮助

Very likely because vectorization means that you may have different results, or may mean that you miss floating point signals/exceptions.

If you're compiling for 32-bit x86 then gcc and g++ default to using the x87 for floating point math, on 64-bit they default to SSE, however the x87 can and will produce different values for the same computation so it's unlikely g++ will consider vectorizing if it can't guarantee that you will get the same results unless you use -ffast-math or some of the flags it turns on.

Basically it comes down to the floating point environment for vectorized code may not be the same as the one for non vectorized code, sometimes in ways that are important, if the differences don't matter to you, something like

-fno-math-errno -fno-trapping-math -fno-signaling-nans -fno-rounding-math

but first look up those options and make sure that they won't affect your program's correctness. -ffinite-math-only may help also

最单纯的乌龟 2024-09-08 17:11:42

因为 -ffast-math 启用操作数重新排序,从而允许对许多代码进行矢量化。

例如,要计算此值,

sum = a[0] + a[1] + a[2] + a[3] + a[4] + a[5] + … a[99]

编译器要求按顺序进行加法,而无需-ffast-math,因为浮点数学既不满足交换律,也不满足交换律。联想的。

这也是同样的原因 为什么编译器无法将 a*a*a*a*a*a 优化为 (a*a *a)*(a*a*a) 没有 -ffast-math

这意味着除非您有非常高效的水平向量加法,否则无法进行向量化。

但是,如果启用 -ffast-math,则可以计算表达式 像这样(查看A7.自动向量化

sum0 = a[0] + a[4] + a[ 8] + … a[96]
sum1 = a[1] + a[5] + a[ 9] + … a[97]
sum2 = a[2] + a[6] + a[10] + … a[98]
sum3 = a[3] + a[7] + a[11] + … a[99]
sum’ = sum0 + sum1 + sum2 + sum3

现在编译器可以通过并行添加每一列来轻松对其进行向量化,然后在结束

sum' == sum 吗?仅当 (a[0]+a[4]+…) + (a[1]+a[5]+…) + (a[2]+a[6]+…) + ([a [3]+a[7]+…) == a[0] + a[1] + a[2] + … 这在结合性下成立,浮点数始终不遵守结合性。指定 /fp:fast 可以让编译器转换您的代码以使其运行速度更快 - 对于这个简单的计算,速度最多可提高 4 倍。

您喜欢快速还是精确? - A7。自动矢量化

可以通过 -fassociative-math gcc 中的标志

进一步阅读

Because -ffast-math enables operands reordering which allows many code to be vectorized.

For example to calculate this

sum = a[0] + a[1] + a[2] + a[3] + a[4] + a[5] + … a[99]

the compiler is required to do the additions sequentially without -ffast-math, because floating-point math is neither commutative nor associative.

That's the same reason why compilers can't optimize a*a*a*a*a*a to (a*a*a)*(a*a*a) without -ffast-math

That means no vectorization available unless you have very efficient horizontal vector adds.

However if -ffast-math is enabled, the expression can be calculated like this (Look at A7. Auto-Vectorization)

sum0 = a[0] + a[4] + a[ 8] + … a[96]
sum1 = a[1] + a[5] + a[ 9] + … a[97]
sum2 = a[2] + a[6] + a[10] + … a[98]
sum3 = a[3] + a[7] + a[11] + … a[99]
sum’ = sum0 + sum1 + sum2 + sum3

Now the compiler can vectorize it easily by adding each column in parallel and then do a horizontal add at the end

Does sum’ == sum? Only if (a[0]+a[4]+…) + (a[1]+a[5]+…) + (a[2]+a[6]+…) + ([a[3]+a[7]+…) == a[0] + a[1] + a[2] + … This holds under associativity, which floats don’t adhere to, all of the time. Specifying /fp:fast lets the compiler transform your code to run faster – up to 4 times faster, for this simple calculation.

Do You Prefer Fast or Precise? - A7. Auto-Vectorization

It may be enabled by the -fassociative-math flag in gcc

Further readings

惟欲睡 2024-09-08 17:11:42

要使用 gcc 启用自动矢量化,ffast-math 实际上并不是必需的。请参阅 https://gcc.gnu.org/projects/tree-ssa /矢量化.html#using

要启用浮点缩减的矢量化,请使用 -ffast-math 或 -fassociative-math。

使用 -fassociative-math 应该就足够了。

自 2007 年以来一直是这种情况,请参阅 https://gcc.gnu .org/projects/tree-ssa/vectorization.html#oldnews

  • 可以使用 -fassociative-math 代替 -ffast-math 来实现浮点数减少的矢量化 (2007-09-04)。
  • To enable auto-vectorization with gcc, ffast-math is not actually necessary. See https://gcc.gnu.org/projects/tree-ssa/vectorization.html#using

    To enable vectorization of floating point reductions use -ffast-math or -fassociative-math.

    Using -fassociative-math should be sufficient.

    This has been the case since 2007, see https://gcc.gnu.org/projects/tree-ssa/vectorization.html#oldnews

    1. -fassociative-math can be used instead of -ffast-math to enable vectorization of reductions of floats (2007-09-04).
    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文