我可以使用舍入来确保原子浮点运算的确定性吗？

发布于 2025-01-04 16:36:36 字数 280 浏览 7 评论 0原文

我正在开发一个需要浮点确定性的 C 应用程序。我还希望浮点运算能够相当快。这包括 IEEE754 未指定的标准超越函数，例如正弦和对数。与硬件浮点相比，我考虑过的软件浮点实现相对较慢，因此我正在考虑简单地舍入每个答案中的一两个最低有效位。对于我的应用程序来说，精度的损失是一个充分的妥协，但这足以确保跨平台的确定性结果吗？所有浮点值都将为双精度值。

我意识到运算顺序是浮点结果差异的另一个潜在来源。我已经有办法解决这个问题了。

如果现在使用的主要浮点硬件实现有软件实现，那就太好了，这样我就可以直接测试这样的假设。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

病毒体 2025-01-11 16:36:36

据我了解，您有一个像 sin(x) 这样的超越函数的软件实现，以 IEEE 标准运算（例如浮点加法和乘法）表示，并且您希望确保在所有机器上得到相同的答案（或者，至少是您关心的所有机器）。

首先，了解：这不会移植到所有机器。例如IBM大型机十六进制浮点不是IEEE，并且不会给出相同的答案。为了准确地实现这一点，您需要有一个 IEEEE 标准运算（例如 FP 加法和乘法）的软件实现。

我猜你只关心实现 IEEE 标准浮点的机器。我还猜测您并不担心 NaN，因为 NaN 并未完全由 IEEE 754-1985 标准化，并且出现了两种相反的实现：HP 和 MIPS，几乎所有其他人都使用 NaN。1

有了这些限制，您如何获得计算的可变性？

(1) 代码是否正在并行化。确保这种情况不会发生。（这不太可能，但有些机器可能会。）并行化是 FP 结果变化的主要来源。我认识的至少一家公司，他们关心可重复性和并行性，拒绝使用 FP，而只使用整数。

(2) 确保机器设置正确。

例如，大多数机器以 32 或 64 位精度计算（C 原始标准到处都是 64 位“双精度”。但是 Intel x86/x87 可以在寄存器中以 80 位计算，并在溢出时舍入为 64 或 32。 1 展示了如何更改使用内联汇编从 80 位到 64 位进行 x86/x87 精度控制请注意，此代码是汇编级别的并且不可移植 - 但大多数其他机器已经以 32 或 64 位精度进行计算，您无需担心。 x87 80 位

（顺便说一下，在 x86 上，您只能使用 SSE FP 来避免所有问题；旧的 Intel x87 FP 永远无法给出完全相同的答案（尽管如果您设置精度）控制（PC）到64位而不是80位，你会得到相同的结果，除非有中间溢出，因为指数宽度不受影响，只是尾数））

例如，确保您在上使用相同的下溢模式即确保所有机器都处于齐零模式。这是多布森的选择：刷新到零模式并未标准化，但某些机器（例如 GPU）根本没有非规范化数字。即许多机器具有 IEEE 标准数字格式，但没有实际的 IEEE 标准算术（带分规范）。我的建议是要求 IEEE denorms，但如果我绝对偏执的话，我会选择冲洗为零，并强制在软件中冲洗自己。

(3) 确保您使用相同的语言选项。较旧的 C 程序以“双精度”（64 位）进行所有计算，但现在允许以单精度进行计算。不管怎样，您希望在所有机器上都以相同的方式执行此操作。

(4) 一些较小的项目与您的代码有关：

避免编译器可能重新排列的大表达式（如果它没有正确实现严格的 FP 开关）

可能以简单的形式编写所有代码，例如

double a = ...;
double b = ...;
double c = a *b;
double d = ...;
double e = a*d;
double f = c + e;

而

f = (a*b) + (a*c);

不是可能会优化为

f = a*(b+c);

I我们将把编译器选项留到最后讨论，因为它更长。

如果你做了所有这些事情，那么你的计算应该是绝对可重复的。 IEEE 浮点是精确的——它总是给出相同的答案。编译器在 IEEE FP 的过程中重新排列计算，从而引入了可变性。

您不需要对低位进行四舍五入。但这样做也不会造成伤害，并且可能会掩盖一些问题。请记住：您可能需要为每个添加屏蔽至少一位......

(2) 编译器优化在不同的机器上以不同的方式重新排列代码。正如一位评论者所说，使用您的编译器开关来实现严格的 FP。

您可能必须禁用包含 sin 代码的文件的所有优化。

您可能必须使用挥发物。

希望有更具体的编译器开关。例如，对于 gcc：

-ffp-contract=off --- 禁用融合乘法加法，因为并非所有目标机器都可能具有它们。

-fexcess precision=standard --- 禁用内部寄存器中的 Intel x86/x87 额外精度等内容

-std=c99 --- 指定相当严格的 C 语言标准。不幸的是没有完全实现，因为我今天谷歌它

确保你没有启用像 -funsafe-math 和 -fassociativbe-math 这样的优化

As I understand it, you have a software implementation of a transcendental function like sin(x), expressed in terms of IEEE standard operations such as floating point add and multiply, and you want to ensure that you get the same answer on all machines (or, at least, all the machines that you care about).

First, understand: this will not be portable to all machines. E.g. IBM mainframe hex floating point is not IEEE, and will not give the same answers. To get that exact, you would need to have a software implementation of the IEEEE standard operations like FP add and multiply.

I'm guessing that you only care about machines that implement IEEE standard floating point. And I am also guessing that you are not worried about NaNs, since NaNs were not completely standardized by IEEE 754-1985, and two opposite implementations arose: HP and MIPS, vedrsus almost everyone else.1

With those restrictions, how can you get variability in your calculations?

(1) If the code is being parallelized. Make sure that is not happening. (It's unlikely, but some machines might.) Parallelization is a major source of result variation in FP. At least one company I know, who cares about reproduceability and parallelism, refuses to use FP, and only uses integer.

(2) Ensure that the machine is set up appropriately.

E.g. most machines calculate in 32 or 64 bit precision (C original standard was 64 bit "double" everywhere. But Intel x86/x87 can calculate in 80 bit in registers, and round to 64 or 32 when spilling. 1 shows how to change the x86/x87 precision control from 80 bit to 64 bit, using inline assembly. Note that this code is assembly level and not portable - but most other machines already do computation in 32 or 64 bit precision, and you don't need to worry about the x87 80 bit.

(By the way, on x86 you you can only avoid all issues by using SSE FP; the old legacy Intel x87 FP can never give exactly the same answers (although if you set precision control (PC) to 64 bit rather than 80 bit, you will get the same results except if there was an intermediate overflow, since the exponent width is not affected, just the mantissa))

E.g. ensure that you are using the same underflow mode on all machines. I.e. ensure denorms or enabled, or oppositely that all machines are in flush to zero mode. Here it is a Dobson's choice: flush to zero modes are not standardized, but some machines, e.g. GPUs, simply have not had denormalized numbers. I.e. many machines have IEEE standard number FORMATS, but not actual IEEE standard arithmetic (with denorms). My druther is to require IEEE denorms, but if I were absolutely paranoid I would go with flush to zero, and force that flushing myself in the software.

(3) Ensure that you are using the same language ioptions. Older C programs do all calculations in "double" (64-bit), but it is now permissible to calculate in single precision. Whatever, you want to do it the same way on all machines.

(4) Some smaller items wrt your code:

Avoid big expressions that a compiler is likely to rearrange (if it doesn't implement strict FP switches properly)

Possible write all of your code in simple form like

double a = ...;
double b = ...;
double c = a *b;
double d = ...;
double e = a*d;
double f = c + e;

Rather than

f = (a*b) + (a*c);

which might be optimized to

f = a*(b+c);

I'll leave talking about compiler options for the last, because it is longer.

If you do all of these things, then your calculations should be absolutely repeatable. IEEE floating point is exact - it always gives the same answers. It is the rearranging of the calculations by the compiler on the way to the IEEE FP that introduces variability.

There should be no need for you to round off low order bits. But doing so also will not hurt, and may mask some issues. Remember: you may need to mask off at least one bit for every add....

(2) Compiler optimizations rearranging the code in different ways on different machines. As one commenter said, use whatever your compiler switches for strict FP are.

You might have to disable all optimization for the file containing your sin code.

You might have to use volatiles.

Hopefully there are compiler switches that are more specific. E.g. for gcc:

-ffp-contract=off --- disable fused multiply add, since not all of your target machines may have them.

-fexcess precision=standard --- disables stuff like Intel x86/x87 excess precision in internal registers

-std=c99 --- specifies fairly strict C language standard. Unfortunately not completely implemented, as I google it today

make sure you do not have optimizations enabled like -funsafe-math and -fassociativbe-math

回复收藏 0 原文

~没有更多了~