128位浮点二进制表示错误
假设我们有一些128位浮点数,例如x = 2.6(1.3 * 2^1 IEEE-754)。 我这样加入了这样的联合:
union flt {
long double flt;
int64_t byte8[OCTALC];
} d;
d = x;
然后我运行它以在内存中获得十六进制的表示:
void print_bytes(void *ptr, int size)
{
unsigned char *p = ptr;
int i;
for (i=0; i<size; i++) {
printf("%02hhX ", p[i]);
}
printf("\n");
}
// some where in the code
print_bytes(&d.byte8[0], 16);
我得到类似的东西
66 66 66 66 66 66 66 A6 00 40 00 00 00 00 00 00
,所以假设我希望看到一个领先位(左侧)为1(因为指数为2.6是1),但实际上,我认为正确的位是1(就像它处理价值的大个子)一样。如果我翻转签名,则输出更改为:
66 66 66 66 66 66 66 A6 00 C0 00 00 00 00 00 00
因此,标志位似乎比我想象的。而且,如果您计算字节,似乎只有10个字节剩余的6个字节就像截断之类的东西。 我试图找出为什么会发生任何帮助?
Let's say we have some 128bit floating point number, for example x = 2.6 (1.3 * 2^1 ieee-754).
I put in in union like this:
union flt {
long double flt;
int64_t byte8[OCTALC];
} d;
d = x;
Then i run this to get it hexadecimal representation in memory:
void print_bytes(void *ptr, int size)
{
unsigned char *p = ptr;
int i;
for (i=0; i<size; i++) {
printf("%02hhX ", p[i]);
}
printf("\n");
}
// some where in the code
print_bytes(&d.byte8[0], 16);
And i get something like
66 66 66 66 66 66 66 A6 00 40 00 00 00 00 00 00
So by assumption i expect to see one of the leading bits(the left ones) to be 1(because exponent of 2.6 is 1) but in fact i see right bits to be 1(like it treating value big-endian). If i flip sign the output changes to:
66 66 66 66 66 66 66 A6 00 C0 00 00 00 00 00 00
So it seems like sign bit is righter than i thought. And if you count the bytes it seems like there is only 10 bytes used remaining 6 is like truncated or something.
I trying to find out why this happens any help?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您有许多误解。
首先,您没有128位的浮点数。
长
可能是 x86扩展精度格式在X86-64上。这是一个80位(10个字节)值,将其填充至16个字节。 (我怀疑这是出于对齐的目的。)当然,它将以小字节订单(因为这是x86/x86-64)。这不是指每个字节中的位顺序,而是指整体上的字节顺序。
最后,指数有偏见。 1指数未存储为1。它存储为1+0x3fff。这允许负数。
因此,我们将得到以下内容:
demo
如果我们删除填充物并反向字节来换取字节,以更好地匹配匹配Wikipedia页面中的图像,我们将
其转换为
(0xa66 ... 6 = 0B1010 0110 0110 ... 0110⇒0B1.01001100 1100 1100 ... 110 [0] = 0x1.4cc ... c)
或
小数使用
或
You have a number of misconceptions.
First of all, you don't have a 128-bit floating point number.
long double
is probably a float in the x86 extended precision format on an x86-64. This is an 80 bit (10 byte) value, which is padded to 16 bytes. (I suspect this is for alignment purposes.)And of course, it's going to be in little-endian byte order (since this is an x86/x86-64). This doesn't refer to the order of the bits in each byte, it refers to the order of the bytes in the whole.
And finally, the exponent is biased. An exponent of 1 isn't stored as 1. It's stored as 1+0x3FFF. This allows for negative exponents.
So we get the following:
Demo on Compiler Explorer
If we remove the padding and reverse the bytes to better match the image in the Wikipedia page, we get
This translates to
(0xA66...6 = 0b1010 0110 0110...0110 ⇒ 0b1.0100 1100 1100...110[0] = 0x1.4CC...C)
or
Decimal conversion obtained using
or
您被某些非常的奇怪方面所困扰,通常在Intel Architectures的C中实现了扩展精确的浮点。所以不要感到难过。 :-)
您所看到的是,尽管
sizeof(长double)
可能是16(== 128位),但内心深处是 80-1位Intel扩展格式。它被6个字节填充,在您的情况下,这是0。我在机器上看到了同样的东西,这是我一直想知道的。这似乎是真正的浪费,不是吗?我曾经认为这是与机器的某种兼容性,实际上确实有128位长双打。但这是不可能的,因为这种0个0个字节格式是 binary-compatiabile with true IEEE 128位浮点,除其他外,因为填充物在错误的端。
You've been bamboozled by some very strange aspects of the way extended-precision floating-point is typically implemented in C on Intel architectures. So don't feel too bad. :-)
What you're seeing is that although
sizeof(long double)
may be 16 (== 128 bits), deep down inside what you're really getting is the 80-bit Intel extended format. It's being padded out with 6 bytes, which in your case happen to be 0. So, yes, "the sign bit is righter than you thought".I see the same thing on my machine, and it's something I've always wondered about. It seems like a real waste, doesn't it? I used to think it was for some kind of compatibility with machines which actually do have 128-bit long doubles. But that can't be it, because this 0-padded 16-byte format is not binary-compatible with true IEEE 128-bit floating point, among other things because the padding is on the wrong end.