双精度浮点数如何存储和计算?
我真的很好奇双精度浮点数是如何存储的。
这些是我到目前为止所弄清楚的事情。
- 它们需要 64 位内存
- 由三部分组成
- 符号位(1位长)
- 指数(11 位长)
- 小数(53 位,假定第一位始终为 1,因此仅存储 52,除非所有 52 位均为 0。则假定前导位为 0)
但是我不明白什么是指数、指数偏差以及 维基百科页面。
谁能向我解释一下这些东西是什么,它们是如何工作的,并最终一步一步计算出真实的数字?
I'm really curious about how Double Precision Floating point number is stored.
These are things I figured out so far.
- They require 64 bits in memory
- Consist of three parts
- Sign bit (1 bit long)
- Exponent (11 bit long)
- Fraction (53 bits, the first bit is assumed always to be 1, thus only 52 are stored, except when all 52 bits are 0. Then leading bit is assumed to be 0)
However I do not uderstand what is exponent, exponent bias and all those formulas in wikipedia page.
Can anyone explain me what are all those things, how they work and eventually calculated to the real number step by step?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
查看页面下方的公式:
除了上述例外情况外,整个双精度数的描述如下:
(-1)^sign * 2^(exponent -bias) * 1.mantissa
该公式意味着对于非 NAN、非 INF、非零和非正规数(我将忽略它们),您采用尾数中的位并在顶部添加隐式 1 位。这使得尾数为 1.0 ... 1.111111...11(二进制)范围内的 53 位。要获得实际值,请将尾数乘以 2 的指数减去偏差 (1023) 的幂,然后根据符号位对结果取反或不取反。数字 1.0 的无偏指数为零(即 1.0 = 1.0 * 2^0),其有偏指数将为 1023(偏差只是添加到指数上)。因此,1.0 将是符号 = 1,指数 = 1023,尾数 = 0(记住隐藏的尾数位)。
将它们全部放在十六进制中,值将是 0x3FF000000000 == 1.0。
Check out the formula a little further down the page:
Except for the above exceptions, the entire double-precision number is described by:
(-1)^sign * 2^(exponent - bias) * 1.mantissa
The formula means that for non-NAN, non-INF, non-zero and non-denormal numbers (which I'll ignore) you take the bits in the mantissa and add an implicit 1 bit at the top. This makes the mantissa 53 bits in the range 1.0 ... 1.111111...11 (binary). To get the actual value, you multiply the mantissa by the 2 to the power of the exponent minus the bias (1023) and either negate the result or not depending on the sign bit. The number 1.0 would have an unbiased exponent of zero (i.e. 1.0 = 1.0 * 2^0) and its biased exponent would be 1023 (the bias is just added to the exponent). So, 1.0 would be sign = 1, exponent = 1023, mantissa = 0 (remember the hidden mantissa bit).
Putting it all together in hexadecimal the value would be 0x3FF000000000 == 1.0.
e
,使得fraction * 2^e
等于我想要表示的数字。一个例子(在单精度中,我更舒服地写=)):
如果我必须表示 -0.75 我会:
- 二进制表示将是
-11 * 2^-2 = -1.1 * 2^-1
1
所以我们有
-0.75 = 1 01111110 10000000000000000000000
对于总和,您必须对齐指数,然后可以对小数部分求和。
对于乘法,您必须
e
such thatfraction * 2^e
is equal to the number that i want to rappresent.an example (in single precision couse is more comfortable for me to write =)):
if i had to rappresent -0.75 i do:
- binary rappresentation will be
-11 * 2^-2 = -1.1 * 2^-1
1
126 -> 01111110
so we had
-0.75 = 1 01111110 10000000000000000000000
For the sum you have to align the exponent and then you can sum the fracional part.
For multiplication you have to