64 位浮点数在所有现代 PC 上的表现是否相同?

发布于 2024-08-19 13:16:07 字数 199 浏览 2 评论 0原文

我想知道我是否可以假设在任何现代 PC 和最常见的编程语言中对相同 64 位浮点数进行相同的操作会给出完全相同的结果? (C++、Java、C# 等)。我们可以假设,我们正在对数字进行操作,结果也是一个数字(没有 NaN、INF 等)。

我知道有两个非常相似的使用浮点数的计算标准(IEEE 854-1987 和 IEEE 754-2008)。但我不知道实际情况如何。

I would like to know whether i can assume that same operations on same 64-bit floating point numbers gives exactly the same results on any modern PC and in most common programming languages? (C++, Java, C#, etc.). We can assume, that we are operating on numbers and result is also a number (no NaNs, INFs and so on).

I know there are two very simmilar standards of computation using floating point numbers (IEEE 854-1987 and IEEE 754-2008). However I don't know how it is in practice.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

千秋岁 2024-08-26 13:16:07

实现 64 位浮点的现代处理器通常实现接近 IEEE 754-1985 标准(最近被 754-2008 标准取代)的标准。

754 标准指定了某些基本运算(特别是加法、减法、乘法、除法、平方根和求反)应得到的结果。在大多数情况下,数值结果是精确指定的:结果必须是在舍入模式指定的方向(向最近、向无穷大、向零或向负无穷大)最接近精确数学结果的可表示数字。在“至最近”模式下,该标准还指定了如何打破联系。

因此,不涉及溢出等异常情况的操作将在符合标准的不同处理器上得到相同的结果。

然而,有几个问题会妨碍在不同处理器上获得相同的结果。其中之一是编译器通常可以自由地以各种方式实现浮点运算序列。例如,如果您在 C 中编写“a = bc + d”,其中所有变量都声明为 double,则编译器可以自由地以双精度算术或其他方式计算“bc”更大的范围或精度。例如,如果处理器具有能够保存扩展精度浮点数的寄存器,并且使用扩展精度进行算术运算并不比使用双精度运算花费更多的 CPU 时间,则编译器可能会使用扩展精度生成代码-精确。在此类处理器上,您可能无法获得与在其他处理器上相同的结果。即使编译器定期执行此操作,在某些情况下也可能不会这样做,因为在复杂的序列期间寄存器已满,因此它将中间结果临时存储在内存中。当它这样做时,它可能只写入 64 位双精度数,而不是扩展精度数。因此,包含浮点算术的例程可能会给出不同的结果,只是因为它是用不同的代码编译的,可能内联在一个地方,并且编译器需要寄存器来处理其他事情。

某些处理器具有在一条指令中计算乘法和加法的指令,因此可以在没有中间舍入的情况下计算“bc + d”,并获得比首先计算 b 的处理器更准确的结果c,然后加上 d。

您的编译器可能有开关来控制这样的行为。

有些地方754-1985标准并不要求唯一的结果。例如,当确定是否发生下溢(结果太小而无法准确表示)时,该标准允许实现在将有效数(小数位)四舍五入到目标精度之前或之后进行确定。因此,某些实现会告诉您发生了下溢,而其他实现则不会。

处理器的一个共同特征是具有“几乎 IEEE 754”模式,该模式通过替换零而不是返回标准要求的非常小的数字来消除处理下溢的困难。当然,在这种模式下执行时,您会得到与在更兼容的模式下执行时不同的数字。出于性能原因,非兼容模式可能是您的编译器和/或操作系统的默认设置。

请注意,IEEE 754 实现通常不仅仅由硬件提供,而是由硬件和软件的组合提供。处理器可以完成大部分工作,但依赖软件来处理某些异常、设置某些模式等。

当您超越基本算术运算而转向正弦和余弦等运算时,您非常依赖于您使用的库。超越函数通常是通过精心设计的近似值来计算的。这些实现是由不同的工程师独立开发的,并得到彼此不同的结果。在一个系统上,sin 函数对于小参数(小于 pi 左右)可能会在 ULP(最低精度单位)范围内给出准确的结果,但对于大参数会产生较大的误差。在另一个系统上,sin 函数可能会为所有参数在多个 ULP 范围内提供准确的结果。目前尚无已知的数学库能够为所有输入生成正确的舍入结果。有一个项目 crlibm(Correctly Rounded Libm),为实现这一目标做了一些很好的工作,他们已经为数学库的重要部分开发了实现,这些部分被正确舍入并具有良好的性能,但不是所有数学库然而。

总之,如果您有一组可管理的计算、了解编译器实现并且非常小心,则可以在不同处理器上获得相同的结果。否则,获得完全相同的结果并不是您可以信赖的。

Modern processors that implement 64-bit floating-point typically implement something that is close to the IEEE 754-1985 standard, recently superseded by the 754-2008 standard.

The 754 standard specifies what result you should get from certain basic operations, notably addition, subtraction, multiplication, division, square root, and negation. In most cases, the numeric result is specified precisely: The result must be the representable number that is closest to the exact mathematical result in the direction specified by the rounding mode (to nearest, toward infinity, toward zero, or toward negative infinity). In "to nearest" mode, the standard also specifies how ties are broken.

Because of this, operations that do not involve exception conditions such as overflow will get the same results on different processors that conform to the standard.

However, there are several issues that interfere with getting identical results on different processors. One of them is that the compiler is often free to implement sequences of floating-point operations in a variety of ways. For example, if you write "a = bc + d" in C, where all variables are declared double, the compiler is free to compute "bc" in either double-precision arithmetic or something with more range or precision. If, for example, the processor has registers capable of holding extended-precision floating-point numbers and doing arithmetic with extended-precision does not take any more CPU time than doing arithmetic with double-precision, a compiler is likely to generate code using extended-precision. On such a processor, you might not get the same results as you would on another processor. Even if the compiler does this regularly, it might not in some circumstances because the registers are full during a complicated sequence, so it stores the intermediate results in memory temporarily. When it does that, it might write just the 64-bit double rather than the extended-precision number. So a routine containing floating-point arithmetic might give different results just because it was compiled with different code, perhaps inlined in one place, and the compiler needed registers for something else.

Some processors have instructions to compute a multiply and an add in one instruction, so "bc + d" might be computed with no intermediate rounding and get a more accurate result than on a processor that first computes bc and then adds d.

Your compiler might have switches to control behavior like this.

There are some places where the 754-1985 standard does not require a unique result. For example, when determining whether underflow has occurred (a result is too small to be represented accurately), the standard allows an implementation to make the determination either before or after it rounds the significand (the fraction bits) to the target precision. So some implementations will tell you underflow has occurred when other implementations will not.

A common feature in processors is to have an "almost IEEE 754" mode that eliminates the difficulty of dealing with underflow by substituting zero instead of returning the very small number that the standard requires. Naturally, you will get different numbers when executing in such a mode than when executing in the more compliant mode. The non-compliant mode may be the default set by your compiler and/or operating system, for reasons of performance.

Note that an IEEE 754 implementation is typically not provided just by hardware but by a combination of hardware and software. The processor may do the bulk of the work but rely on the software to handle certain exceptions, set certain modes, and so on.

When you move beyond the basic arithmetic operations to things like sine and cosine, you are very dependent on the library you use. Transcendental functions are generally calculated with carefully engineered approximations. The implementations are developed independently by various engineers and get different results from each other. On one system, the sin function may give results accurate within an ULP (unit of least precision) for small arguments (less than pi or so) but larger errors for large arguments. On another system, the sin function might give results accurate within several ULP for all arguments. No current math library is known to produce correctly rounded results for all inputs. There is a project, crlibm (Correctly Rounded Libm), that has done some good work toward this goal, and they have developed implementations for significant parts of the math library that are correctly rounded and have good performance, but not all of the math library yet.

In summary, if you have a manageable set of calculations, understand your compiler implementation, and are very careful, you can rely on identical results on different processors. Otherwise, getting completely identical results is not something you can rely on.

春庭雪 2024-08-26 13:16:07

如果您的意思是得到完全相同的结果,那么答案是否定的。

在某些情况下,您甚至可能会在同一台计算机上获得调试(未优化)构建与发布构建(优化)不同的结果,因此甚至不要假设不同计算机上的结果可能始终相同。

(例如,在具有 Intel 处理器的计算机上,如果优化器将中间结果的变量保留在寄存器中,则可能会发生这种情况,该变量在未优化的版本中存储在内存中。由于 Intel FPU 寄存器是 80 位,而双变量是 64 位位,中间结果将以更高的精度存储在优化构建中,导致后续结果中出现不同的值。)。

然而,在实践中,您可能经常会得到相同的结果,但您不应该依赖它。

If you mean getting exactly the same result, then the answer is no.

You might even get different results for debug (non-optimized) builds vs. release builds (optimized) on the same machine in some cases, so don't even assume that the results might be always identical on different machines.

(This can happen e.g. on a computer with an Intel processor, if the optimizer keeps a variable for an intermediate result in a register, that is stored in memory in the unoptimized build. Since Intel FPU registers are 80 bit, and double variables are 64 bit, the intermediate result will be stored with greater precision in the optimized build, causing different values in later results.).

In practice, however, you may often get the same results, but you shouldn't rely on it.

久夏青 2024-08-26 13:16:07

现代 FPU 都以单精度和双精度格式实现 IEEE754 浮点,有些还以扩展格式实现。支持特定的一组操作(math.h 中的几乎所有操作),其中有一些特殊的指令。

Modern FPUs all implement IEEE754 floats in single and double formats, and some in extended format. A certain set of operations are supported (pretty much anything in math.h), with some special instructions floating around out there.

绿光 2024-08-26 13:16:07

假设您正在谈论应用多个操作,我认为您不会得到确切的数字。 CPU 架构、编译器使用、优化设置都会改变计算结果。

如果你指的是确切的操作顺序(在汇编级别),我认为你仍然会得到变化。例如,英特尔芯片在内部使用扩展精度(80 位),其他 CPU 可能并非如此。 (我不认为扩展精度是强制性的)

assuming you are talking about applying multiple operations, I do not think you will get exact numbers. CPU architecture, compiler use, optimization settings will change the results of your computations.

if you mean the exact order of operations (at the assembly level), I think you will still get variations.for example Intel chips use extended precision (80 bits) internally, which may not be the case for other CPUs. (I do not think extended precision is mandated)

天涯沦落人 2024-08-26 13:16:07

同一个 C# 程序在同一台 PC 上,一次在没有优化的调试模式下编译,第二次在启用优化的发布模式下编译,可能会产生不同的数值结果。这是我的个人经历。当我们第一次为我们的一个程序设置自动回归测试套件时,我们没有考虑到这一点,并且完全惊讶于我们的许多测试在没有任何明显原因的情况下失败了。

The same C# program can bring out different numerical results on the same PC, once compiled in debug mode without optimization, second time compiled in release mode with optimization enabled. That's my personal experience. We did not regard this when we set up an automatic regression test suite for one of our programs for the first time, and were completely surprised that a lot of our tests failed without any apparent reason.

北音执念 2024-08-26 13:16:07

对于 x86 上的 C#,使用 80 位 FP 寄存器。

C# 标准规定处理器必须以与类型本身相同或更高的精度运行(即在这种情况下为 64 位)的“双”)。允许促销,但存储除外。这意味着局部变量和参数的精度可能高于 64 位。

换句话说,将成员变量分配给局部变量可能(实际上在某些情况下)足以给出不等式。

另请参阅:调试/发布模式下的浮点/双精度

For C# on x86, 80-bit FP registers are used.

The C# standard says that the processor must operate at the same precision as, or greater than, the type itself (i.e. 64-bit in the case of a 'double'). Promotions are allowed, except for storage. That means that locals and parameters could be at greater than 64-bit precision.

In other words, assigning a member variable to a local variable could (and in fact will under certain circumstances) be enough to give an inequality.

See also: Float/double precision in debug/release modes

染墨丶若流云 2024-08-26 13:16:07

对于64位数据类型,我只知道"double precision" / "binary64"来自所使用的 IEEE 754(1985 年和 2008 年对于常见情况没有太大区别)。

注意:IEEE 854-1987 中定义的基数类型无论如何都会被 IEEE 754-2008 取代。

For the 64-bit data type, I only know of "double precision" / "binary64" from the IEEE 754 (1985 and 2008 don't differ much here for common cases) being used.

Note: The radix types defined in IEEE 854-1987 are superseded by IEEE 754-2008 anyways.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文