IEEE 754 浮点数无法准确表示的第一个整数是哪一个?
为了清楚起见,如果我使用实现 IEE 754 浮点的语言并且我声明:
float f0 = 0.f;
float f1 = 1.f;
...然后将它们打印回来,我将准确地得到 0.0000 和 1.0000。
但 IEEE 754 无法表示实数线上的所有数字。接近于零,“差距”很小;当你离得更远时,差距就会变得更大。
所以,我的问题是:对于 IEEE 754 浮点数,这是第一个(最接近零)无法精确表示的整数?我现在只关心 32 位浮点数,尽管如果有人给出 64 位的答案,我将很感兴趣!
我认为这就像计算 2bits_of_mantissa 并加 1 一样简单,其中 bits_of_mantissa 是标准公开的位数。我在我的机器(MSVC++、Win64)上对 32 位浮点数执行了此操作,但看起来不错。
For clarity, if I'm using a language that implements IEE 754 floats and I declare:
float f0 = 0.f;
float f1 = 1.f;
...and then print them back out, I'll get 0.0000 and 1.0000 - exactly.
But IEEE 754 isn't capable of representing all the numbers along the real line. Close to zero, the 'gaps' are small; as you get further away, the gaps get larger.
So, my question is: for an IEEE 754 float, which is the first (closest to zero) integer which cannot be exactly represented? I'm only really concerned with 32-bit floats for now, although I'll be interested to hear the answer for 64-bit if someone gives it!
I thought this would be as simple as calculating 2bits_of_mantissa and adding 1, where bits_of_mantissa is how many bits the standard exposes. I did this for 32-bit floats on my machine (MSVC++, Win64), and it seemed fine, though.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
2尾数位 + 1 + 1
指数(尾数位+1)中的+1是因为,如果尾数包含
abcdef...
,它代表的数字实际上是1.abcdef... × 2^e
,提供额外的隐式精度。因此,第一个无法精确表示且将被舍入的整数是:
以下是 CPython 3.10 中的示例,它使用 64 位浮点数:
2mantissa bits + 1 + 1
The +1 in the exponent (mantissa bits + 1) is because, if the mantissa contains
abcdef...
the number it represents is actually1.abcdef... × 2^e
, providing an extra implicit bit of precision.Therefore, the first integer that cannot be accurately represented and will be rounded is:
Here's an example in CPython 3.10, which uses 64-bit floats:
n 位整数可表示的最大值为 2n-1。如上所述,
float
的尾数精度为 24 位,这似乎意味着 224 不适合。但是。
指数范围内的2的幂可以精确地表示为1.0×2n,因此224可以拟合,因此
float
的第一个不可表示的整数是 224+1。如上所述。再次。The largest value representable by an n bit integer is 2n-1. As noted above, a
float
has 24 bits of precision in the significand which would seem to imply that 224 wouldn't fit.However.
Powers of 2 within the range of the exponent are exactly representable as 1.0×2n, so 224 can fit and consequently the first unrepresentable integer for
float
is 224+1. As noted above. Again.