浮点比较问题

发布于 2024-09-27 20:09:49 字数 203 浏览 4 评论 0原文

void main()
{
    float f = 0.98;
    if(f <= 0.98)
        printf("hi");
    else
        printf("hello");
    getch();
}

我在这里遇到这个问题。使用不同的 fi 浮点值会得到不同的结果。 为什么会发生这种情况?

void main()
{
    float f = 0.98;
    if(f <= 0.98)
        printf("hi");
    else
        printf("hello");
    getch();
}

I am getting this problem here.On using different floating point values of f i am getting different results.
Why this is happening?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

傲影 2024-10-04 20:09:49

f 使用的是 float 精度,但默认情况下 0.98 是 double 精度,因此语句 f <= 0.98 使用双精度进行比较。

因此,在比较中,f 会转换为 double,但可能会使结果略大于 0.98。

使用

if(f <= 0.98f)

或使用 double 代替 f


详细...假设 floatIEEE 单精度 doubleIEEE 双精度

这些类型的浮点数以 2 为基数表示形式存储。在以 2 为基数的情况下,该数字需要无限精度来表示,因为它是重复的小数:

0.98 = 0.1111101011100001010001111010111000010100011110101110000101000...

float 只能存储 24 位有效数字,即

       0.111110101110000101000111_101...
                                 ^ round off here
   =   0.111110101110000101001000

   =   16441672 / 2^24

   =   0.98000001907...

double 可以存储 53 位有效数字位,所以

       0.11111010111000010100011110101110000101000111101011100_00101000...
                                                              ^ round off here
   =   0.11111010111000010100011110101110000101000111101011100

   =   8827055269646172 / 2^53

   =   0.97999999999999998224...

0.98 在 float 中会变得稍大,而在 double 中会稍小。

f is using float precision, but 0.98 is in double precision by default, so the statement f <= 0.98 is compared using double precision.

The f is therefore converted to a double in the comparison, but may make the result slightly larger than 0.98.

Use

if(f <= 0.98f)

or use a double for f instead.


In detail... assuming float is IEEE single-precision and double is IEEE double-precision.

These kinds of floating point numbers are stored with base-2 representation. In base-2 this number needs an infinite precision to represent as it is a repeated decimal:

0.98 = 0.1111101011100001010001111010111000010100011110101110000101000...

A float can only store 24 bits of significant figures, i.e.

       0.111110101110000101000111_101...
                                 ^ round off here
   =   0.111110101110000101001000

   =   16441672 / 2^24

   =   0.98000001907...

A double can store 53 bits of signficant figures, so

       0.11111010111000010100011110101110000101000111101011100_00101000...
                                                              ^ round off here
   =   0.11111010111000010100011110101110000101000111101011100

   =   8827055269646172 / 2^53

   =   0.97999999999999998224...

So the 0.98 will become slightly larger in float and smaller in double.

小嗲 2024-10-04 20:09:49

这是因为浮点值并不是数字的精确表示。所有以 10 为基数的数字都需要在计算机上表示为以 2 为基数的数字。正是在这种转换中,精度丢失了。

欲了解更多相关信息,请访问 http://en.wikipedia.org/wiki/Floating_point


一个例子(在我的VB6天遇到这个问题)

要将数字1.1转换为单精度浮点数,我们需要将其转换为二进制。需要创建 32 位。

位1是符号位(是负数[1]还是位置[0])
位 2-9 用于指数值
位 10-32 用于尾数(又名有效数,基本上是科学记数法的系数),

因此对于 1.1,单个浮点值存储如下(这是截断值,编译器可能会在幕后舍入最低有效位,但是我所做的只是截断它,这稍微不太准确,但不会改变此示例的结果):

s --exp--- -------mantissa--------
0 01111111 00011001100110011001100

如果您注意到尾数中有重复模式 0011。二进制中的 1/10 相当于十进制中的 1/3。它永远持续下去。因此,要从 32 位单精度浮点值中检索值,我们必须首先将指数和尾数转换为十进制数,以便我们可以使用它们。

符号 = 0 = 正数

指数:01111111 = 127

尾数:00011001100110011001100 = 838860

对于尾数,我们需要将其转换为十进制值。原因是二进制数前面有一个隐含的整数(即1.00011001100110011001100)。隐含数字是因为尾数代表科学记数法中使用的标准化值:1.0001100110011.... * 2^(x-127)。

为了从 838860 中得到十进制值,我们只需除以 2^-23,因为尾数有 23 位。这给我们 0.099999904632568359375。将隐含的 1 添加到尾数,得到 1.099999904632568359375。指数为 127,但公式要求 2^(x-127)。

所以这里是数学:

(1 + 099999904632568359375) * 2^(127-127)

1.099999904632568359375 * 1 = 1.099999904632568359375

正如你所看到的,1.1 并不是真正像 1.1 那样存储在单个浮点值中。

It's because floating point values are not exact representations of the number. All base ten numbers need to be represented on the computer as base 2 numbers. It's in this conversion that precision is lost.

Read more about this at http://en.wikipedia.org/wiki/Floating_point


An example (from encountering this problem in my VB6 days)

To convert the number 1.1 to a single precision floating point number we need to convert it to binary. There are 32 bits that need to be created.

Bit 1 is the sign bit (is it negative [1] or position [0])
Bits 2-9 are for the exponent value
Bits 10-32 are for the mantissa (a.k.a. significand, basically the coefficient of scientific notation )

So for 1.1 the single floating point value is stored as follows (this is truncated value, the compiler may round the least significant bit behind the scenes, but all I do is truncate it, which is slightly less accurate but doesn't change the results of this example):

s --exp--- -------mantissa--------
0 01111111 00011001100110011001100

If you notice in the mantissa there is the repeating pattern 0011. 1/10 in binary is like 1/3 in decimal. It goes on forever. So to retrieve the values from the 32-bit single precision floating point value we must first convert the exponent and mantissa to decimal numbers so we can use them.

sign = 0 = a positive number

exponent: 01111111 = 127

mantissa: 00011001100110011001100 = 838860

With the mantissa we need to convert it to a decimal value. The reason is there is an implied integer ahead of the binary number (i.e. 1.00011001100110011001100). The implied number is because the mantissa represents a normalized value to be used in the scientific notation: 1.0001100110011.... * 2^(x-127).

To get the decimal value out of 838860 we simply divide by 2^-23 as there are 23 bits in the mantissa. This gives us 0.099999904632568359375. Add the implied 1 to the mantissa gives us 1.099999904632568359375. The exponent is 127 but the formula calls for 2^(x-127).

So here is the math:

(1 + 099999904632568359375) * 2^(127-127)

1.099999904632568359375 * 1 = 1.099999904632568359375

As you can see 1.1 is not really stored in the single floating point value as 1.1.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文