当前位置：文江博客话题详情

整数和浮点精度

发布于 2024-08-15 12:00:11 字数 115 浏览 3 评论 0原文

这更多的是一个数值分析而不是编程问题，但我想你们中的一些人能够回答它。

两个浮点数相加，是否有精度损失？为什么？

在浮点数和整数之和中，是否有精度损失？为什么？

谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

抚笙 2024-08-22 12:00:11

两个浮点数相加，是否有精度损失？

如果两个浮点数的大小不同，并且都使用完整的精度范围（大约 7 个小数位），那么是的，您会在最后的位置看到一些损失。

为什么？

这是因为浮点数以（符号）（尾数）× 2^（指数）的形式存储。如果两个值具有不同的指数并且将它们相加，则较小的值将减少到尾数中的位数更少（因为它必须适应较大的指数）：

PS> [float]([float]0.0000001 + [float]1)
1

浮点型和整数相加时，是否有精度损失？

是的，普通的 32 位整数能够精确地表示不完全适合浮点数的值。浮点数仍然可以存储大约相同的数字，但不再精确。当然，这只适用于足够大的数字，即。长于 24 位。

为什么？

因为浮点数有 24 位精度，而（32 位）整数有 32 位。浮点数仍然能够保留大小和大部分有效数字，但最后的位置可能会有所不同：

PS> [float]2100000050 + [float]100
2100000100

In the sum two floats, is there any precision lost?

If both floats have differing magnitude and both are using the complete precision range (of about 7 decimal digits) then yes, you will see some loss in the last places.

Why?

This is because floats are stored in the form of (sign) (mantissa) × 2^(exponent). If two values have differing exponents and you add them, then the smaller value will get reduced to less digits in the mantissa (because it has to adapt to the larger exponent):

PS> [float]([float]0.0000001 + [float]1)
1

In the sum of a float and a integer, is there any precision lost?

Yes, a normal 32-bit integer is capable of representing values exactly which do not fit exactly into a float. A float can still store approximately the same number, but no longer exactly. Of course, this only applies to numbers that are large enough, i. e. longer than 24 bits.

Why?

Because float has 24 bits of precision and (32-bit) integers have 32. float will still be able to retain the magnitude and most of the significant digits, but the last places may likely differ:

PS> [float]2100000050 + [float]100
2100000100

回复收藏 0 原文

小糖芽 2024-08-22 12:00:11

精度取决于原始数字的大小。在浮点数中，计算机内部将数字 312 表示为科学记数法：

3.12000000000 * 10 ^ 2

左侧的小数位（尾数）是固定的。指数也有上限和下限。这使得它可以表示非常大或非常小的数字。

如果您尝试将两个大小相同的数字相加，则结果的精度应保持相同，因为小数点不必移动：

312.0 + 643.0 <==>

3.12000000000 * 10 ^ 2 +
6.43000000000 * 10 ^ 2
-----------------------
9.55000000000 * 10 ^ 2

如果您尝试将一个非常大的数字和一个非常小的数字相加，您会失去精度，因为它们必须被压缩为上述格式。考虑 312 + 12300000000000000000000。首先，您必须缩放较小的数字以与较大的数字对齐，然后添加：

1.23000000000 * 10 ^ 15 +
0.00000000003 * 10 ^ 15
-----------------------
1.23000000003 <-- precision lost here!

浮点可以处理非常大或非常小的数字。但它不能同时代表两者。

至于int和double相加，int立即变成double，然后上面的内容适用。

The precision depends on the magnitude of the original numbers. In floating point, the computer represents the number 312 internally as scientific notation:

3.12000000000 * 10 ^ 2

The decimal places in the left hand side (mantissa) are fixed. The exponent also has an upper and lower bound. This allows it to represent very large or very small numbers.

If you try to add two numbers which are the same in magnitude, the result should remain the same in precision, because the decimal point doesn't have to move:

312.0 + 643.0 <==>

3.12000000000 * 10 ^ 2 +
6.43000000000 * 10 ^ 2
-----------------------
9.55000000000 * 10 ^ 2

If you tried to add a very big and a very small number, you would lose precision because they must be squeezed into the above format. Consider 312 + 12300000000000000000000. First you have to scale the smaller number to line up with the bigger one, then add:

1.23000000000 * 10 ^ 15 +
0.00000000003 * 10 ^ 15
-----------------------
1.23000000003 <-- precision lost here!

Floating point can handle very large, or very small numbers. But it can't represent both at the same time.

As for ints and doubles being added, the int gets turned into a double immediately, then the above applies.

回复收藏 0 原文

雨巷深深 2024-08-22 12:00:11

当两个浮点数相加时，通常会出现一些错误。 D. Goldberg 的“每个计算机科学家都应该了解浮点运算” 详细描述了效果和原因，以及如何计算误差上限，以及如何推断更复杂计算的精度。

当将浮点数与整数相加时，C++首先将整数转换为浮点数，因此由于与上述相同的原因，添加了两个浮点数并引入了错误。

回复收藏 0 原文

微暖i 2024-08-22 12:00:11

float 可用的精度是有限的，因此当然总是存在任何给定操作降低精度的风险。

你的两个问题的答案都是“是”。

如果你尝试将一个非常大的浮点数添加到一个非常小的浮点数上，你就会遇到问题。

或者，如果您尝试将整数添加到浮点数，其中整数使用的位数多于浮点数可用于尾数的位数。

回复收藏 0 原文

素食主义者 2024-08-22 12:00:11

简短的答案：计算机表示具有有限位数的浮点数，这通常通过尾数和指数，所以只用了几个字节作为有效数字，其他的用来表示小数点的位置。

如果您尝试将（例如）10^23 和 7 相加，那么它将无法准确表示该结果。当浮点数和整数相加时，也适用类似的论点——整数将被提升为浮点数。

回复收藏 0 原文

女皇必胜 2024-08-22 12:00:11

在两个浮点数相加中，是否有精度损失？
在浮点数和整数之和中，是否有精度损失？为什么？

并非总是如此。如果总和可以用您要求的精度表示，并且您不会得到任何精度损失。

示例：0.5 + 0.75 =>无精度损失
x * 0.5 =>没有精度损失（除非 x 太小）

在一般情况下，加法会在稍微不同的范围内浮动，因此存在精度损失，这实际上取决于舍入模式。
即：如果您要添加具有完全不同范围的数字，预计会出现精度问题。

非正规化是为了在极端情况下提供额外的精度，但会牺牲 CPU 的性能。

根据编译器处理浮点计算的方式，结果可能会有所不同。

根据严格的 IEEE 语义，添加两个 32 位浮点数不应提供比 32 位更好的精度。
在实践中，可能需要更多指令来确保这一点，因此您不应依赖浮点的准确且可重复的结果。

回复收藏 0 原文

作妖 2024-08-22 12:00:11

在这两种情况下都是：

assert( 1E+36f + 1.0f == 1E+36f );
assert( 1E+36f + 1 == 1E+36f );

In both cases yes:

assert( 1E+36f + 1.0f == 1E+36f );
assert( 1E+36f + 1 == 1E+36f );

回复收藏 0 原文

狼亦尘 2024-08-22 12:00:11

float + int 的情况与 float + float 相同，因为对 int 应用了标准转换。在 float + float 的情况下，这是依赖于实现的，因为实现可能选择以双精度进行加法。当然，当您存储结果时可能会出现一些损失。

回复收藏 0 原文

女中豪杰 2024-08-22 12:00:11

在这两种情况下，答案都是“是”。将 int 添加到 float 时，无论如何，在添加发生之前，整数都会转换为浮点表示形式。

要了解原因，我建议您阅读以下宝石：每个计算机科学家应该了解什么浮点运算。

回复收藏 0 原文

~没有更多了~

关于作者

把时间冻结

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

整数和浮点精度

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

不再见

真是无聊啊

樱娆

浅语花开

烛光

绻影浮沉

友情链接

整数和浮点精度

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

不再见

真是无聊啊

樱娆

浅语花开

烛光

绻影浮沉

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。