Floor() 是否返回可以完全表示的东西?
在 C89 中,floor() 返回双精度值。 以下内容是否保证有效?
double d = floor(3.0 + 0.5);
int x = (int) d;
assert(x == 3);
我担心的是,floor 的结果可能无法在 IEEE 754 中精确表示。因此 d 得到类似 2.99999 的值,而 x 最终为 2。
对于这个问题的答案是“是”,int 范围内的所有整数都具有为了精确地表示为双精度数,floor 必须始终返回精确表示的值。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您的浮点类型支持所需的尾数位,则所有整数都可以具有精确的浮点表示。 由于 double 使用 53 位尾数,因此它可以准确存储所有 32 位 int。 毕竟,您可以将值设置为指数为零的尾数。
All integers can have exact floating point representation if your floating point type supports the required mantissa bits. Since
double
uses 53 bits for mantissa, it can store all 32-bitint
s exactly. After all, you could just set the value as mantissa with zero exponent.如果 Floor() 的结果不能完全表示,您期望 d 的值是多少? 当然,如果您已经得到了变量中浮点数的表示,那么根据定义,它是完全可以表示的,不是吗? 你已经得到了 d 中的表示...
(此外,Mehrdad 的答案对于 32 位整数来说是正确的。在具有 64 位双精度和 64 位整数的编译器中int,当然你还有更多问题......)
编辑:也许你的意思是“floor()的理论结果,即小于或等于参数的最大整数值,可能无法表示为int”。 这当然是真的。 对于 int 为 32 位的系统显示这一点的简单方法:
我不记得当从浮点到整数的转换溢出时 C 会做什么......但它会在这里发生。
编辑:还有其他有趣的情况需要考虑。 下面是一些 C# 代码和结果 - 我想象在 C 中至少会发生类似的事情。在 C# 中,
double
被定义为 64 位,也是如此长
。结果:
换句话说:
并不总是与
原始
相同。 这应该不足为奇 - long 值比 double 多(给定 NaN 值),并且大量 double 不是整数,因此我们不能期望每个 long 都能精确表示。 然而,所有 32 位整数都可以表示为双精度数。If the result of floor() isn't exactly representable, what do you expect the value of d to be? Surely if you've got the representation of a floating point number in a variable, then by definition it's exactly representable isn't it? You've got the representation in d...
(In addition, Mehrdad's answer is correct for 32 bit ints. In a compiler with a 64 bit double and a 64 bit int, you've got more problems of course...)
EDIT: Perhaps you meant "the theoretical result of floor(), i.e. the largest integer value less than or equal to the argument, may not be representable as an int". That's certainly true. Simple way of showing this for a system where int is 32 bits:
I can't remember offhand what C does when conversions from floating point to integer overflow... but it's going to happen here.
EDIT: There are other interesting situations to consider too. Here's some C# code and results - I'd imagine at least similar things would happen in C. In C#,
double
is defined to be 64 bits and so islong
.Results:
In other words:
isn't always the same as
original
. This shouldn't come as any surprise - there are more long values than doubles (given the NaN values) and plenty of doubles aren't integers, so we can't expect every long to be exactly representable. However, all 32 bit integers are representable as doubles.我认为你对你想问的问题有点困惑。
floor(3 + 0.5)
不是一个很好的例子,因为 3、0.5 及其总和都可以用任何现实世界的浮点格式精确表示。floor(0.1 + 0.9)
是一个更好的例子,这里真正的问题不是floor
的结果是否可以精确表示,而是数字是否不精确floor 之前将导致返回值与您期望的不同。 在这种情况下,我相信答案是肯定的,但这在很大程度上取决于您的具体数字。如果这种方法不好的话,我邀请其他人批评这种方法,但一种可能的解决方法可能是在调用
floor
之前将您的数字乘以(1.0+0x1p-52)
或类似的值(也许使用nextafter
会更好)。 这可以补偿数字最后一个二进制位中的错误导致其恰好低于而不是恰好落在整数值上的情况,但它不会考虑在多次操作中累积的错误。 如果您需要这种水平的数字稳定性/准确性,您需要进行一些深入分析或使用可以正确处理您的数字的任意精度或精确数学库。I think you're a bit confused about what you want to ask.
floor(3 + 0.5)
is not a very good example, because 3, 0.5, and their sum are all exactly representable in any real-world floating point format.floor(0.1 + 0.9)
would be a better example, and the real question here is not whether the result offloor
is exactly representable, but whether inexactness of the numbers prior to callingfloor
will result in a return value different from what you would expect, had all numbers been exact. In this case, I believe the answer is yes, but it depends a lot on your particular numbers.I invite others to criticize this approach if it's bad, but one possible workaround might be to multiply your number by
(1.0+0x1p-52)
or something similar prior to callingfloor
(perhaps usingnextafter
would be better). This could compensate for cases where an error in the last binary place of the number causes it to fall just below rather than exactly on an integer value, but it will not account for errors which have accumulated over a number of operations. If you need that level of numeric stability/exactness, you need to either do some deep analysis or use an arbitrary-precision or exact-math library which can handle your numbers correctly.