整数和浮点精度

发布于 2024-08-15 12:00:11 字数 115 浏览 3 评论 0原文

这更多的是一个数值分析而不是编程问题,但我想你们中的一些人能够回答它。

两个浮点数相加,是否有精度损失?为什么?

在浮点数和整数之和中,是否有精度损失?为什么?

谢谢。

This is more of a numerical analysis rather than programming question, but I suppose some of you will be able to answer it.

In the sum two floats, is there any precision lost? Why?

In the sum of a float and a integer, is there any precision lost? Why?

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

抚笙 2024-08-22 12:00:11

两个浮点数相加,是否有精度损失?

如果两个浮点数的大小不同,并且都使用完整的精度范围(大约 7 个小数位),那么是的,您会在最后的位置看到一些损失。

为什么?

这是因为浮点数以(符号)(尾数)× 2(指数)的形式存储。如果两个值具有不同的指数并且将它们相加,则较小的值将减少到尾数中的位数更少(因为它必须适应较大的指数):

PS> [float]([float]0.0000001 + [float]1)
1

浮点型和整数相加时,是否有精度损失?

是的,普通的 32 位整数能够精确地表示不完全适合浮点数的值。浮点数仍然可以存储大约相同的数字,但不再精确。当然,这只适用于足够大的数字,即。长于 24 位。

为什么?

因为浮点数有 24 位精度,而(32 位)整数有 32 位。浮点数仍然能够保留大小和大部分有效数字,但最后的位置可能会有所不同:

PS> [float]2100000050 + [float]100
2100000100

In the sum two floats, is there any precision lost?

If both floats have differing magnitude and both are using the complete precision range (of about 7 decimal digits) then yes, you will see some loss in the last places.

Why?

This is because floats are stored in the form of (sign) (mantissa) × 2(exponent). If two values have differing exponents and you add them, then the smaller value will get reduced to less digits in the mantissa (because it has to adapt to the larger exponent):

PS> [float]([float]0.0000001 + [float]1)
1

In the sum of a float and a integer, is there any precision lost?

Yes, a normal 32-bit integer is capable of representing values exactly which do not fit exactly into a float. A float can still store approximately the same number, but no longer exactly. Of course, this only applies to numbers that are large enough, i. e. longer than 24 bits.

Why?

Because float has 24 bits of precision and (32-bit) integers have 32. float will still be able to retain the magnitude and most of the significant digits, but the last places may likely differ:

PS> [float]2100000050 + [float]100
2100000100
小糖芽 2024-08-22 12:00:11

精度取决于原始数字的大小。在浮点数中,计算机内部将数字 312 表示为科学记数法:

3.12000000000 * 10 ^ 2

左侧的小数位(尾数)是固定的。指数也有上限和下限。这使得它可以表示非常大或非常小的数字。

如果您尝试将两个大小相同的数字相加,则结果的精度应保持相同,因为小数点不必移动:

312.0 + 643.0 <==>

3.12000000000 * 10 ^ 2 +
6.43000000000 * 10 ^ 2
-----------------------
9.55000000000 * 10 ^ 2

如果您尝试将一个非常大的数字和一个非常小的数字相加,您会失去精度,因为它们必须被压缩为上述格式。考虑 312 + 12300000000000000000000。首先,您必须缩放较小的数字以与较大的数字对齐,然后添加:

1.23000000000 * 10 ^ 15 +
0.00000000003 * 10 ^ 15
-----------------------
1.23000000003 <-- precision lost here!

浮点可以处理非常大或非常小的数字。但它不能同时代表两者。

至于int和double相加,int立即变成double,然后上面的内容适用。

The precision depends on the magnitude of the original numbers. In floating point, the computer represents the number 312 internally as scientific notation:

3.12000000000 * 10 ^ 2

The decimal places in the left hand side (mantissa) are fixed. The exponent also has an upper and lower bound. This allows it to represent very large or very small numbers.

If you try to add two numbers which are the same in magnitude, the result should remain the same in precision, because the decimal point doesn't have to move:

312.0 + 643.0 <==>

3.12000000000 * 10 ^ 2 +
6.43000000000 * 10 ^ 2
-----------------------
9.55000000000 * 10 ^ 2

If you tried to add a very big and a very small number, you would lose precision because they must be squeezed into the above format. Consider 312 + 12300000000000000000000. First you have to scale the smaller number to line up with the bigger one, then add:

1.23000000000 * 10 ^ 15 +
0.00000000003 * 10 ^ 15
-----------------------
1.23000000003 <-- precision lost here!

Floating point can handle very large, or very small numbers. But it can't represent both at the same time.

As for ints and doubles being added, the int gets turned into a double immediately, then the above applies.

雨巷深深 2024-08-22 12:00:11

当两个浮点数相加时,通常会出现一些错误。 D. Goldberg 的“每个计算机科学家都应该了解浮点运算” 详细描述了效果和原因,以及如何计算误差上限,以及如何推断更复杂计算的精度。

当将浮点数与整数相加时,C++首先将整数转换为浮点数,因此由于与上述相同的原因,添加了两个浮点数并引入了错误。

When adding two floating point numbers, there is generally some error. D. Goldberg's "What Every Computer Scientist Should Know About Floating-Point Arithmetic" describes the effect and the reasons in detail, and also how to calculate an upper bound on the error, and how to reason about the precision of more complex calculations.

When adding a float to an integer, the integer is first converted to a float by C++, so two floats are being added and error is introduced for the same reasons as above.

微暖i 2024-08-22 12:00:11

float 可用的精度是有限的,因此当然总是存在任何给定操作降低精度的风险。

你的两个问题的答案都是“是”。

如果你尝试将一个非常大的浮点数添加到一个非常小的浮点数上,你就会遇到问题。

或者,如果您尝试将整数添加到浮点数,其中整数使用的位数多于浮点数可用于尾数的位数。

The precision available for a float is limited, so of course there is always the risk that any given operation drops precision.

The answer for both your questions is "yes".

If you try adding a very large float to a very small one, you will for instance have problems.

Or if you try to add an integer to a float, where the integer uses more bits than the float has available for its mantissa.

素食主义者 2024-08-22 12:00:11

简短的答案:计算机表示具有有限位数的浮点数,这通常通过 尾数和指数,所以只用了几个字节作为有效数字,其他的用来表示小数点的位置。

如果您尝试将(例如)10^23 和 7 相加,那么它将无法准确表示该结果。当浮点数和整数相加时,也适用类似的论点——整数将被提升为浮点数。

The short answer: a computer represents a float with a limited number of bits, which is often done with mantissa and exponent, so only a few bytes are used for the significant digits, and the others are used to represent the position of the decimal point.

If you were to try to add (say) 10^23 and 7, then it won't be able to accurately represent that result. A similar argument applies when adding a float and integer -- the integer will be promoted to a float.

女皇必胜 2024-08-22 12:00:11

在两个浮点数相加中,是否有精度损失?
在浮点数和整数之和中,是否有精度损失?为什么?

并非总是如此。如果总和可以用您要求的精度表示,并且您不会得到任何精度损失。

示例:0.5 + 0.75 =>无精度损失
x * 0.5 =>没有精度损失(除非 x 太小)

在一般情况下,加法会在稍微不同的范围内浮动,因此存在精度损失,这实际上取决于舍入模式。
即:如果您要添加具有完全不同范围的数字,预计会出现精度问题。

非正规化是为了在极端情况下提供额外的精度,但会牺牲 CPU 的性能。

根据编译器处理浮点计算的方式,结果可能会有所不同。

根据严格的 IEEE 语义,添加两个 32 位浮点数不应提供比 32 位更好的精度。
在实践中,可能需要更多指令来确保这一点,因此您不应依赖浮点的准确且可重复的结果。

In the sum two floats, is there any precision lost?
In the sum of a float and a integer, is there any precision lost? Why?

Not always. If the sum is representable with the precision you ask, and you won't get any precision loss.

Example: 0.5 + 0.75 => no precision loss
x * 0.5 => no precision loss (except if x is too much small)

In the general case, one add floats in slightly different ranges so there is a precision loss which actually depends on the rounding mode.
ie: if you're adding numbers with totally different ranges, expect precision problems.

Denormals are here to give extra-precision in extreme cases, at the expense of CPU.

Depending on how your compiler handle floating-point computation, results can vary.

With strict IEEE semantics, adding two 32 bits floats should not give better accuracy than 32 bits.
In practice it may requires more instruction to ensure that, so you shouldn't rely on accurate and repeatable results with floating-point.

作妖 2024-08-22 12:00:11

在这两种情况下都是:

assert( 1E+36f + 1.0f == 1E+36f );
assert( 1E+36f + 1 == 1E+36f );

In both cases yes:

assert( 1E+36f + 1.0f == 1E+36f );
assert( 1E+36f + 1 == 1E+36f );
狼亦尘 2024-08-22 12:00:11

float + int 的情况与 float + float 相同,因为对 int 应用了标准转换。在 float + float 的情况下,这是依赖于实现的,因为实现可能选择以双精度进行加法。当然,当您存储结果时可能会出现一些损失。

The case float + int is the same as float + float, because a standard conversion is applied to the int. In the case of float + float, this is implementation dependent, because an implementation may choose to do the addition at double precision. There may be some loss when you store the result, of course.

女中豪杰 2024-08-22 12:00:11

在这两种情况下,答案都是“是”。将 int 添加到 float 时,无论如何,在添加发生之前,整数都会转换为浮点表示形式。

要了解原因,我建议您阅读以下宝石:每个计算机科学家应该了解什么浮点运算

In both cases, the answer is "yes". When adding an int to a float, the integer is converted to floating point representation before the addition takes place anyway.

To understand why, I suggest you read this gem: What Every Computer Scientist Should Know About Floating-Point Arithmetic.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文