64 位浮点数在所有现代 PC 上的表现是否相同？

发布于 2024-08-19 13:16:07 字数 199 浏览 2 评论 0原文

我想知道我是否可以假设在任何现代 PC 和最常见的编程语言中对相同 64 位浮点数进行相同的操作会给出完全相同的结果？（C++、Java、C# 等）。我们可以假设，我们正在对数字进行操作，结果也是一个数字（没有 NaN、INF 等）。

我知道有两个非常相似的使用浮点数的计算标准（IEEE 854-1987 和 IEEE 754-2008）。但我不知道实际情况如何。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千秋岁 2024-08-26 13:16:07

实现 64 位浮点的现代处理器通常实现接近 IEEE 754-1985 标准（最近被 754-2008 标准取代）的标准。

754 标准指定了某些基本运算（特别是加法、减法、乘法、除法、平方根和求反）应得到的结果。在大多数情况下，数值结果是精确指定的：结果必须是在舍入模式指定的方向（向最近、向无穷大、向零或向负无穷大）最接近精确数学结果的可表示数字。在“至最近”模式下，该标准还指定了如何打破联系。

因此，不涉及溢出等异常情况的操作将在符合标准的不同处理器上得到相同的结果。

然而，有几个问题会妨碍在不同处理器上获得相同的结果。其中之一是编译器通常可以自由地以各种方式实现浮点运算序列。例如，如果您在 C 中编写“a = bc + d”，其中所有变量都声明为 double，则编译器可以自由地以双精度算术或其他方式计算“bc”更大的范围或精度。例如，如果处理器具有能够保存扩展精度浮点数的寄存器，并且使用扩展精度进行算术运算并不比使用双精度运算花费更多的 CPU 时间，则编译器可能会使用扩展精度生成代码-精确。在此类处理器上，您可能无法获得与在其他处理器上相同的结果。即使编译器定期执行此操作，在某些情况下也可能不会这样做，因为在复杂的序列期间寄存器已满，因此它将中间结果临时存储在内存中。当它这样做时，它可能只写入 64 位双精度数，而不是扩展精度数。因此，包含浮点算术的例程可能会给出不同的结果，只是因为它是用不同的代码编译的，可能内联在一个地方，并且编译器需要寄存器来处理其他事情。

某些处理器具有在一条指令中计算乘法和加法的指令，因此可以在没有中间舍入的情况下计算“bc + d”，并获得比首先计算 b 的处理器更准确的结果c，然后加上 d。

您的编译器可能有开关来控制这样的行为。

有些地方754-1985标准并不要求唯一的结果。例如，当确定是否发生下溢（结果太小而无法准确表示）时，该标准允许实现在将有效数（小数位）四舍五入到目标精度之前或之后进行确定。因此，某些实现会告诉您发生了下溢，而其他实现则不会。

处理器的一个共同特征是具有“几乎 IEEE 754”模式，该模式通过替换零而不是返回标准要求的非常小的数字来消除处理下溢的困难。当然，在这种模式下执行时，您会得到与在更兼容的模式下执行时不同的数字。出于性能原因，非兼容模式可能是您的编译器和/或操作系统的默认设置。

请注意，IEEE 754 实现通常不仅仅由硬件提供，而是由硬件和软件的组合提供。处理器可以完成大部分工作，但依赖软件来处理某些异常、设置某些模式等。

当您超越基本算术运算而转向正弦和余弦等运算时，您非常依赖于您使用的库。超越函数通常是通过精心设计的近似值来计算的。这些实现是由不同的工程师独立开发的，并得到彼此不同的结果。在一个系统上，sin 函数对于小参数（小于 pi 左右）可能会在 ULP（最低精度单位）范围内给出准确的结果，但对于大参数会产生较大的误差。在另一个系统上，sin 函数可能会为所有参数在多个 ULP 范围内提供准确的结果。目前尚无已知的数学库能够为所有输入生成正确的舍入结果。有一个项目 crlibm（Correctly Rounded Libm），为实现这一目标做了一些很好的工作，他们已经为数学库的重要部分开发了实现，这些部分被正确舍入并具有良好的性能，但不是所有数学库然而。

总之，如果您有一组可管理的计算、了解编译器实现并且非常小心，则可以在不同处理器上获得相同的结果。否则，获得完全相同的结果并不是您可以信赖的。

Modern processors that implement 64-bit floating-point typically implement something that is close to the IEEE 754-1985 standard, recently superseded by the 754-2008 standard.

The 754 standard specifies what result you should get from certain basic operations, notably addition, subtraction, multiplication, division, square root, and negation. In most cases, the numeric result is specified precisely: The result must be the representable number that is closest to the exact mathematical result in the direction specified by the rounding mode (to nearest, toward infinity, toward zero, or toward negative infinity). In "to nearest" mode, the standard also specifies how ties are broken.

Because of this, operations that do not involve exception conditions such as overflow will get the same results on different processors that conform to the standard.

However, there are several issues that interfere with getting identical results on different processors. One of them is that the compiler is often free to implement sequences of floating-point operations in a variety of ways. For example, if you write "a = bc + d" in C, where all variables are declared double, the compiler is free to compute "bc" in either double-precision arithmetic or something with more range or precision. If, for example, the processor has registers capable of holding extended-precision floating-point numbers and doing arithmetic with extended-precision does not take any more CPU time than doing arithmetic with double-precision, a compiler is likely to generate code using extended-precision. On such a processor, you might not get the same results as you would on another processor. Even if the compiler does this regularly, it might not in some circumstances because the registers are full during a complicated sequence, so it stores the intermediate results in memory temporarily. When it does that, it might write just the 64-bit double rather than the extended-precision number. So a routine containing floating-point arithmetic might give different results just because it was compiled with different code, perhaps inlined in one place, and the compiler needed registers for something else.

Some processors have instructions to compute a multiply and an add in one instruction, so "bc + d" might be computed with no intermediate rounding and get a more accurate result than on a processor that first computes bc and then adds d.

Your compiler might have switches to control behavior like this.

There are some places where the 754-1985 standard does not require a unique result. For example, when determining whether underflow has occurred (a result is too small to be represented accurately), the standard allows an implementation to make the determination either before or after it rounds the significand (the fraction bits) to the target precision. So some implementations will tell you underflow has occurred when other implementations will not.

A common feature in processors is to have an "almost IEEE 754" mode that eliminates the difficulty of dealing with underflow by substituting zero instead of returning the very small number that the standard requires. Naturally, you will get different numbers when executing in such a mode than when executing in the more compliant mode. The non-compliant mode may be the default set by your compiler and/or operating system, for reasons of performance.

Note that an IEEE 754 implementation is typically not provided just by hardware but by a combination of hardware and software. The processor may do the bulk of the work but rely on the software to handle certain exceptions, set certain modes, and so on.

When you move beyond the basic arithmetic operations to things like sine and cosine, you are very dependent on the library you use. Transcendental functions are generally calculated with carefully engineered approximations. The implementations are developed independently by various engineers and get different results from each other. On one system, the sin function may give results accurate within an ULP (unit of least precision) for small arguments (less than pi or so) but larger errors for large arguments. On another system, the sin function might give results accurate within several ULP for all arguments. No current math library is known to produce correctly rounded results for all inputs. There is a project, crlibm (Correctly Rounded Libm), that has done some good work toward this goal, and they have developed implementations for significant parts of the math library that are correctly rounded and have good performance, but not all of the math library yet.

In summary, if you have a manageable set of calculations, understand your compiler implementation, and are very careful, you can rely on identical results on different processors. Otherwise, getting completely identical results is not something you can rely on.

回复收藏 0 原文