c floating-point floating-point-conversion

浮点比较问题

发布于 2024-09-27 20:09:49 字数 203 浏览 4 评论 0原文

void main()
{
    float f = 0.98;
    if(f <= 0.98)
        printf("hi");
    else
        printf("hello");
    getch();
}

我在这里遇到这个问题。使用不同的 fi 浮点值会得到不同的结果。为什么会发生这种情况？

原文

void main()
{
    float f = 0.98;
    if(f <= 0.98)
        printf("hi");
    else
        printf("hello");
    getch();
}

I am getting this problem here.On using different floating point values of f i am getting different results.
Why this is happening?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

傲影 2024-10-04 20:09:49

f 使用的是 float 精度，但默认情况下 0.98 是 double 精度，因此语句 f <= 0.98 使用双精度进行比较。

因此，在比较中，f 会转换为 double，但可能会使结果略大于 0.98。

使用

if(f <= 0.98f)

或使用 double 代替 f。

详细...假设 float 是 IEEE 单精度 double 是 IEEE 双精度。

这些类型的浮点数以 2 为基数表示形式存储。在以 2 为基数的情况下，该数字需要无限精度来表示，因为它是重复的小数：

0.98 = 0.1111101011100001010001111010111000010100011110101110000101000...

float 只能存储 24 位有效数字，即

       0.111110101110000101000111_101...
                                 ^ round off here
   =   0.111110101110000101001000

   =   16441672 / 2^24

   =   0.98000001907...

double 可以存储 53 位有效数字位，所以

       0.11111010111000010100011110101110000101000111101011100_00101000...
                                                              ^ round off here
   =   0.11111010111000010100011110101110000101000111101011100

   =   8827055269646172 / 2^53

   =   0.97999999999999998224...

0.98 在 float 中会变得稍大，而在 double 中会稍小。

f is using float precision, but 0.98 is in double precision by default, so the statement f <= 0.98 is compared using double precision.

The f is therefore converted to a double in the comparison, but may make the result slightly larger than 0.98.

Use

if(f <= 0.98f)

or use a double for f instead.

In detail... assuming float is IEEE single-precision and double is IEEE double-precision.

These kinds of floating point numbers are stored with base-2 representation. In base-2 this number needs an infinite precision to represent as it is a repeated decimal:

0.98 = 0.1111101011100001010001111010111000010100011110101110000101000...

A float can only store 24 bits of significant figures, i.e.

       0.111110101110000101000111_101...
                                 ^ round off here
   =   0.111110101110000101001000

   =   16441672 / 2^24

   =   0.98000001907...

A double can store 53 bits of signficant figures, so

       0.11111010111000010100011110101110000101000111101011100_00101000...
                                                              ^ round off here
   =   0.11111010111000010100011110101110000101000111101011100

   =   8827055269646172 / 2^53

   =   0.97999999999999998224...

So the 0.98 will become slightly larger in float and smaller in double.

回复收藏 0 原文

小嗲 2024-10-04 20:09:49

这是因为浮点值并不是数字的精确表示。所有以 10 为基数的数字都需要在计算机上表示为以 2 为基数的数字。正是在这种转换中，精度丢失了。

欲了解更多相关信息，请访问 http://en.wikipedia.org/wiki/Floating_point

一个例子（在我的VB6天遇到这个问题）

要将数字1.1转换为单精度浮点数，我们需要将其转换为二进制。需要创建 32 位。

位1是符号位（是负数[1]还是位置[0]）
位 2-9 用于指数值
位 10-32 用于尾数（又名有效数，基本上是科学记数法的系数），

因此对于 1.1，单个浮点值存储如下（这是截断值，编译器可能会在幕后舍入最低有效位，但是我所做的只是截断它，这稍微不太准确，但不会改变此示例的结果）：

s --exp--- -------mantissa--------
0 01111111 00011001100110011001100

如果您注意到尾数中有重复模式 0011。二进制中的 1/10 相当于十进制中的 1/3。它永远持续下去。因此，要从 32 位单精度浮点值中检索值，我们必须首先将指数和尾数转换为十进制数，以便我们可以使用它们。

符号 = 0 = 正数

指数：01111111 = 127

尾数：00011001100110011001100 = 838860

对于尾数，我们需要将其转换为十进制值。原因是二进制数前面有一个隐含的整数（即1.00011001100110011001100）。隐含数字是因为尾数代表科学记数法中使用的标准化值：1.0001100110011.... * 2^(x-127)。

为了从 838860 中得到十进制值，我们只需除以 2^-23，因为尾数有 23 位。这给我们 0.099999904632568359375。将隐含的 1 添加到尾数，得到 1.099999904632568359375。指数为 127，但公式要求 2^(x-127)。

所以这里是数学：

(1 + 099999904632568359375) * 2^(127-127)

1.099999904632568359375 * 1 = 1.099999904632568359375

正如你所看到的，1.1 并不是真正像 1.1 那样存储在单个浮点值中。

It's because floating point values are not exact representations of the number. All base ten numbers need to be represented on the computer as base 2 numbers. It's in this conversion that precision is lost.

Read more about this at http://en.wikipedia.org/wiki/Floating_point

An example (from encountering this problem in my VB6 days)

To convert the number 1.1 to a single precision floating point number we need to convert it to binary. There are 32 bits that need to be created.

Bit 1 is the sign bit (is it negative [1] or position [0])
Bits 2-9 are for the exponent value
Bits 10-32 are for the mantissa (a.k.a. significand, basically the coefficient of scientific notation )

So for 1.1 the single floating point value is stored as follows (this is truncated value, the compiler may round the least significant bit behind the scenes, but all I do is truncate it, which is slightly less accurate but doesn't change the results of this example):

s --exp--- -------mantissa--------
0 01111111 00011001100110011001100

If you notice in the mantissa there is the repeating pattern 0011. 1/10 in binary is like 1/3 in decimal. It goes on forever. So to retrieve the values from the 32-bit single precision floating point value we must first convert the exponent and mantissa to decimal numbers so we can use them.

sign = 0 = a positive number

exponent: 01111111 = 127

mantissa: 00011001100110011001100 = 838860

With the mantissa we need to convert it to a decimal value. The reason is there is an implied integer ahead of the binary number (i.e. 1.00011001100110011001100). The implied number is because the mantissa represents a normalized value to be used in the scientific notation: 1.0001100110011.... * 2^(x-127).

To get the decimal value out of 838860 we simply divide by 2^-23 as there are 23 bits in the mantissa. This gives us 0.099999904632568359375. Add the implied 1 to the mantissa gives us 1.099999904632568359375. The exponent is 127 but the formula calls for 2^(x-127).

So here is the math:

(1 + 099999904632568359375) * 2^(127-127)

1.099999904632568359375 * 1 = 1.099999904632568359375

As you can see 1.1 is not really stored in the single floating point value as 1.1.

回复收藏 0 原文

~没有更多了~