浮点值是不精确的,这就是为什么我们在比较中很少使用严格的数值相等。例如,在 Java 中,此打印 false
(如 ideone.com 上所示):
System.out.println(.1 + .2 == .3);
// false
通常,比较浮点计算结果的正确方法是查看与某个期望值的绝对差是否小于某个容许值epsilon。
System.out.println(Math.abs(.1 + .2 - .3) < .00000000000001);
// true
问题在于某些操作能否产生准确的结果。我们知道,对于任何非有限浮点值x
(即NaN
或无穷大),x - x
是始终NaN
。
但如果x
是有限的,这一切能得到保证吗?
-
x * -1 == -x
-
x - x == 0
(特别是我对 Java 行为最感兴趣,但也欢迎讨论其他语言。)
对于它的价值,我认为(我在这里可能是错的)答案是 是的!我认为这可以归结为是否适用于任何有限的 IEEE-754 浮点值,其加法逆总是可以精确计算的。因为例如float
和double
有一个专用的位仅用于符号,似乎是这种情况,因为它只需要翻转符号位即可找到加法逆(即 significand 应保持不变)。
相关问题
Floating point values are inexact, which is why we should rarely use strict numerical equality in comparisons. For example, in Java this prints false
(as seen on ideone.com):
System.out.println(.1 + .2 == .3);
// false
Usually the correct way to compare results of floating point calculations is to see if the absolute difference against some expected value is less than some tolerated epsilon.
System.out.println(Math.abs(.1 + .2 - .3) < .00000000000001);
// true
The question is about whether or not some operations can yield exact result. We know that for any non-finite floating point value x
(i.e. either NaN
or an infinity), x - x
is ALWAYS NaN
.
But if x
is finite, is any of this guaranteed?
x * -1 == -x
x - x == 0
(In particular I'm most interested in Java behavior, but discussions for other languages are also welcome.)
For what it's worth, I think (and I may be wrong here) the answer is YES! I think it boils down to whether or not for any finite IEEE-754 floating point value, its additive inverse is always computable exactly. Since e.g. float
and double
has one dedicated bit just for the sign, this seems to be the case, since it only needs flipping of the sign bit to find the additive inverse (i.e. the significand should be left intact).
Related questions
发布评论
评论(2)
IEEE 754 浮点保证了这两个相等性,因为
xx
和x * -1
的结果都可以精确地表示为与xx
具有相同精度的浮点数。代码>x。在这种情况下,无论舍入模式如何,兼容的实现都必须返回准确的值。编辑:与
.1 + .2
示例相比。您无法在 IEEE 754 中添加
.1
和.2
,因为您无法将它们表示为传递给+
。加法、减法、乘法、除法和平方根返回唯一的浮点值,根据舍入模式,该值紧邻下面、紧邻上面、最接近处理关系的规则,...,运算的结果R 中的参数相同。因此,当结果(在 R 中)恰好可以表示为浮点数时,无论舍入模式如何,该数字都会自动成为结果。事实上,您的编译器允许您将
0.1
编写为不同的可表示数字的简写,而不会发出警告,这一事实与这些操作的定义是正交的。例如,当您编写- (0.1)
时,-
是精确的:它返回与其参数完全相反的内容。另一方面,它的参数不是0.1
,而是编译器在其位置使用的double
。简而言之,
x * (-1)
运算准确的另一部分原因是-1
可以表示为double
。Both equalities are guaranteed with IEEE 754 floating-point, because the results of both
x-x
andx * -1
are representable exactly as floating-point numbers of the same precision asx
. In this case, regardless of the rounding mode, the exact values have to be returned by a compliant implementation.EDIT: Comparing to the
.1 + .2
example.You can't add
.1
and.2
in IEEE 754 because you can't represent them to pass to+
. Addition, subtraction, multiplication, division and square root return the unique floating-point value which, depending on the rounding mode, is immediately below, immediately above, nearest with a rule to handle ties, ..., the result of the operation on the same arguments in R. Consequently, when the result (in R) happens to be representable as a floating-point number, this number is automatically the result regardless of the rounding mode.The fact that your compiler lets you write
0.1
as shorthand for a different, representable number without a warning is orthogonal to the definition of these operations. When you write- (0.1)
for instance, the-
is exact: it returns exactly the opposite of its argument. On the other hand, its argument is not0.1
, but thedouble
that your compiler uses in its place.In short, another part of the reason why the operation
x * (-1)
is exact is that-1
can be represented as adouble
.虽然
x - x
可能会给您-0
而不是真正的0
,但-0
与比较等于>0
,因此您的假设是安全的,即任何有限数减去自身将比较等于零。请参阅 是否有x 的浮点值,其中 xx == 0 为 false? 了解更多详细信息。
Although
x - x
may give you-0
rather than true0
,-0
compares as equal to0
, so you will be safe with your assumption that any finite number minus itself will compare equal to zero.See Is there a floating point value of x, for which x-x == 0 is false? for more details.