将大型INT转换为加倍,在某些计算机上丧失精度

发布于 2025-01-21 20:13:07 字数 157 浏览 1 评论 0原文

我目前正在学习CPP中的类型数据转换。我被教了

对于一个非常大的INT,我们(对于某些计算机)可能会损失 转换为double时的精度。

但是没有理由发表声明。

有人可以提供解释和例子吗?谢谢

I'm currently learning inter-type data convertion in cpp. I have been taught that

For a really large int, we can (for some computers) suffer a loss of
precision when converting to double.

But no reason was provided for the statement.

Could someone please provide an explanation and an example? Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

饮湿 2025-01-28 20:13:07

假设浮点数使用n位存储空间。

现在,让我们假设该浮点可以精确地表示可以用整数类型n位表示的所有整数。由于n位整数需要其所有n位来表示其所有值,因此对此浮点的要求也是如此。

浮点数应该能够表示分数数。但是,由于所有位用于表示整数,因此剩下零位来表示任何分数。这是一个矛盾,我们必须得出结论,即浮子可以精确代表所有整数作为同等大小的整数类型的假设必须是错误的。

由于在一个位整数范围内必须有不可支配的整数,因此如果转换为不可替代的值之一,将这种整数转换为n位的浮点的浮点可能会失去精度。


现在,由于浮点可以代表有理数的子集,因此其中一些代表值确实可能是整数。特别是,IEEE-754规格保证了二进制双精度浮点可以代表最多2 53 的所有整数。该属性与 mantissa 的长度直接相关。

因此,当在符合IEEE-754的系统上转换为双重时,不可能失去32位整数的精度。


从技术上讲,X86体系结构的浮点单元实际上使用了80位扩展浮点格式,该格式的设计旨在精确表示64位整数,并且可以使用long double double访问类型。

Let's say that the floating point number uses N bits of storage.

Now, let us assume that this float can precisely represent all integers that can be represented by an integer type of N bits. Since the N bit integer requires all of its N bits to represent all of its values, so would be the requirement for this float.

A floating point number should be able to represent fractional numbers. However, since all of the bits are used to represent the integers, there are zero bits left to represent any fractional number. This is a contradiction, and we must conclude that the assumption that float can precisely represent all integers as equally sized integer type must be erroneous.

Since there must be non-representable integers in the range of a N bit integer, it is possible that converting such integer to a floating point of N bits will lose precision, if the converted value happens to be one of the non-representable ones.


Now, since a floating point can represent a subset of rational numbers, some of those representable values may indeed be integers. In particular, the IEEE-754 spec guarantees that a binary double precision floating point can represent all integers up to 253. This property is directly associated with the length of the mantissa.

Therefore it is not possible to lose precision of a 32 bit integer when converting to a double on a system which conforms to IEEE-754.


More technically, the floating point unit of x86 architecture actually uses a 80-bit extended floating point format, which is designed to be able to represent precisely all of 64 bit integers and can be accessed using the long double type.

苯莒 2025-01-28 20:13:07

如果int为64位,而 double 也是64位,则可能会发生这种情况。浮点数由Mantissa(表示数字)和指数组成。作为double在这种情况下的曼蒂萨(Mantissa)的位少于int,因此Double能够表示数字较少,并且会发生精确度损失。

This may happen if int is 64 bit and double is 64 bit as well. Floating point numbers are composed of mantissa (represents the digits) and exponent. As mantissa for the double in such a case has less bits than the int, then double is able to represent less digits and a loss of precision happens.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文