Floor() 是否返回可以完全表示的东西?

发布于 2024-07-11 13:31:46 字数 280 浏览 20 评论 0 原文

在 C89 中,floor() 返回双精度值。 以下内容是否保证有效?

double d = floor(3.0 + 0.5);
int x = (int) d;
assert(x == 3);

我担心的是,floor 的结果可能无法在 IEEE 754 中精确表示。因此 d 得到类似 2.99999 的值,而 x 最终为 2。

对于这个问题的答案是“是”,int 范围内的所有整数都具有为了精确地表示为双精度数,floor 必须始终返回精确表示的值。

In C89, floor() returns a double. Is the following guaranteed to work?

double d = floor(3.0 + 0.5);
int x = (int) d;
assert(x == 3);

My concern is that the result of floor might not be exactly representable in IEEE 754. So d gets something like 2.99999, and x ends up being 2.

For the answer to this question to be yes, all integers within the range of an int have to be exactly representable as doubles, and floor must always return that exactly represented value.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

淑女气质 2024-07-18 13:31:46

如果您的浮点类型支持所需的尾数位,则所有整数都可以具有精确的浮点表示。 由于 double 使用 53 位尾数,因此它可以准确存储所有 32 位 int。 毕竟,您可以将值设置为指数为零的尾数。

All integers can have exact floating point representation if your floating point type supports the required mantissa bits. Since double uses 53 bits for mantissa, it can store all 32-bit ints exactly. After all, you could just set the value as mantissa with zero exponent.

甜心 2024-07-18 13:31:46

如果 Floor() 的结果不能完全表示,您期望 d 的值是多少? 当然,如果您已经得到了变量中浮点数的表示,那么根据定义,它是完全可以表示的,不是吗? 你已经得到了 d 中的表示...

(此外,Mehrdad 的答案对于 32 位整数来说是正确的。在具有 64 位双精度 64 位整数的编译器中int,当然你还有更多问题......)

编辑:也许你的意思是“floor()的理论结果,即小于或等于参数的最大整数值,可能无法表示为int”。 这当然是真的。 对于 int 为 32 位的系统显示这一点的简单方法:

int max = 0x7fffffff;
double number = max;
number += 10.0;
double f = floor(number);
int oops = (int) f;

我不记得当从浮点到整数的转换溢出时 C 会做什么......但它会在这里发生。

编辑:还有其他有趣的情况需要考虑。 下面是一些 C# 代码和结果 - 我想象在 C 中至少会发生类似的事情。在 C# 中,double 被定义为 64 位, 也是如此长

using System;
class Test
{
    static void Main()
    {
        FloorSameInteger(long.MaxValue/2);
        FloorSameInteger(long.MaxValue-2);
    }

    static void FloorSameInteger(long original)
    {
        double convertedToDouble = original;
        double flooredToDouble = Math.Floor(convertedToDouble);
        long flooredToLong = (long) flooredToDouble;

        Console.WriteLine("Original value: {0}", original);
        Console.WriteLine("Converted to double: {0}", convertedToDouble);
        Console.WriteLine("Floored (as double): {0}", flooredToDouble);
        Console.WriteLine("Converted back to long: {0}", flooredToLong);
        Console.WriteLine();
    }
}

结果:

原始值:4611686018427387903
转换为双精度:
4.61168601842739E+18
地板(双):4.61168601842739E+18
转换回长整型:
4611686018427387904

原值:9223372036854775805
转换为双精度:
9.22337203685478E+18
地板(双):9.22337203685478E+18
转换回长整型:
-9223372036854775808

换句话说:

(long) floor((double) original)

并不总是与原始相同。 这应该不足为奇 - long 值比 double 多(给定 NaN 值),并且大量 double 不是整数,因此我们不能期望每个 long 都能精确表示。 然而,所有 32 位整数都可以表示为双精度数。

If the result of floor() isn't exactly representable, what do you expect the value of d to be? Surely if you've got the representation of a floating point number in a variable, then by definition it's exactly representable isn't it? You've got the representation in d...

(In addition, Mehrdad's answer is correct for 32 bit ints. In a compiler with a 64 bit double and a 64 bit int, you've got more problems of course...)

EDIT: Perhaps you meant "the theoretical result of floor(), i.e. the largest integer value less than or equal to the argument, may not be representable as an int". That's certainly true. Simple way of showing this for a system where int is 32 bits:

int max = 0x7fffffff;
double number = max;
number += 10.0;
double f = floor(number);
int oops = (int) f;

I can't remember offhand what C does when conversions from floating point to integer overflow... but it's going to happen here.

EDIT: There are other interesting situations to consider too. Here's some C# code and results - I'd imagine at least similar things would happen in C. In C#, double is defined to be 64 bits and so is long.

using System;
class Test
{
    static void Main()
    {
        FloorSameInteger(long.MaxValue/2);
        FloorSameInteger(long.MaxValue-2);
    }

    static void FloorSameInteger(long original)
    {
        double convertedToDouble = original;
        double flooredToDouble = Math.Floor(convertedToDouble);
        long flooredToLong = (long) flooredToDouble;

        Console.WriteLine("Original value: {0}", original);
        Console.WriteLine("Converted to double: {0}", convertedToDouble);
        Console.WriteLine("Floored (as double): {0}", flooredToDouble);
        Console.WriteLine("Converted back to long: {0}", flooredToLong);
        Console.WriteLine();
    }
}

Results:

Original value: 4611686018427387903
Converted to double:
4.61168601842739E+18
Floored (as double): 4.61168601842739E+18
Converted back to long:
4611686018427387904

Original value: 9223372036854775805
Converted to double:
9.22337203685478E+18
Floored (as double): 9.22337203685478E+18
Converted back to long:
-9223372036854775808

In other words:

(long) floor((double) original)

isn't always the same as original. This shouldn't come as any surprise - there are more long values than doubles (given the NaN values) and plenty of doubles aren't integers, so we can't expect every long to be exactly representable. However, all 32 bit integers are representable as doubles.

涙—继续流 2024-07-18 13:31:46

我认为你对你想问的问题有点困惑。 floor(3 + 0.5) 不是一个很好的例子,因为 3、0.5 及其总和都可以用任何现实世界的浮点格式精确表示。 floor(0.1 + 0.9) 是一个更好的例子,这里真正的问题不是 floor 的结果是否可以精确表示,而是数字是否不精确floor 之前将导致返回值与您期望的不同。 在这种情况下,我相信答案是肯定的,但这在很大程度上取决于您的具体数字。

如果这种方法不好的话,我邀请其他人批评这种方法,但一种可能的解决方法可能是在调用 floor 之前将您的数字乘以 (1.0+0x1p-52) 或类似的值(也许使用 nextafter 会更好)。 这可以补偿数字最后一个二进制位中的错误导致其恰好低于而不是恰好落在整数值上的情况,但它不会考虑在多次操作中累积的错误。 如果您需要这种水平的数字稳定性/准确性,您需要进行一些深入分析或使用可以正确处理您的数字的任意精度或精确数学库。

I think you're a bit confused about what you want to ask. floor(3 + 0.5) is not a very good example, because 3, 0.5, and their sum are all exactly representable in any real-world floating point format. floor(0.1 + 0.9) would be a better example, and the real question here is not whether the result of floor is exactly representable, but whether inexactness of the numbers prior to calling floor will result in a return value different from what you would expect, had all numbers been exact. In this case, I believe the answer is yes, but it depends a lot on your particular numbers.

I invite others to criticize this approach if it's bad, but one possible workaround might be to multiply your number by (1.0+0x1p-52) or something similar prior to calling floor (perhaps using nextafter would be better). This could compensate for cases where an error in the last binary place of the number causes it to fall just below rather than exactly on an integer value, but it will not account for errors which have accumulated over a number of operations. If you need that level of numeric stability/exactness, you need to either do some deep analysis or use an arbitrary-precision or exact-math library which can handle your numbers correctly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文