_mm_mul_ps 无法正确地将 10001 与 10001 相乘,但可以将 10000 与 10000 相乘
我有一个非常简单的程序来乘四个数字。有用 当它们每个都是 10000 时很好,但如果我将它们更改为 10001,则不行。结果会减少 1。
我在 AMD Opteron 和 Intel Xeon 上使用 gcc -msse2 main_sse.c -o sse
编译了该程序,并在两台机器上获得了相同的结果。
我将不胜感激任何帮助。在网上找不到有关此主题的任何内容。
#include <stdlib.h>
#include <stdio.h>
#include <xmmintrin.h>
int main(){
float x[4], y[4], temp[4]; int i; __m128 X, Y, result;
for(i=0; i < 4; i++) { x[i] = 10000; y[i] = 10000; }
X = _mm_load_ps(&x[0]); Y = _mm_load_ps(&y[0]);
result = _mm_mul_ps(X,Y); _mm_store_ps(&temp[0], result);
for(i=0; i < 4; i++) { x[i] = 10001; y[i] = 10001; }
X = _mm_load_ps(&x[0]); Y = _mm_load_ps(&y[0]);
result = _mm_mul_ps(X,Y); _mm_store_ps(&temp[0], result);
}
I have a very simple program to multiply four numbers. It works
fine when each of them is 10000 but does not if I change them to 10001. The result is off by one.
I compiled the program with gcc -msse2 main_sse.c -o sse
on both AMD Opteron and Intel Xeon and get the same result on both machines.
I would appreciate any help. Couldn't find anything online on this topic.
#include <stdlib.h>
#include <stdio.h>
#include <xmmintrin.h>
int main(){
float x[4], y[4], temp[4]; int i; __m128 X, Y, result;
for(i=0; i < 4; i++) { x[i] = 10000; y[i] = 10000; }
X = _mm_load_ps(&x[0]); Y = _mm_load_ps(&y[0]);
result = _mm_mul_ps(X,Y); _mm_store_ps(&temp[0], result);
for(i=0; i < 4; i++) { x[i] = 10001; y[i] = 10001; }
X = _mm_load_ps(&x[0]); Y = _mm_load_ps(&y[0]);
result = _mm_mul_ps(X,Y); _mm_store_ps(&temp[0], result);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您遇到了 IEEE 32 位浮点数 的精度限制。
只有 23 位小数尾数加上开头隐含的“1”。
因此,可以精确表示的最大整数是 224 = 16777216
您需要 27 位才能精确表示 10001*10001 = 100020001 的乘积。
一旦超过 224,您只能得到最接近的偶数。
一旦超过 225,您只能得到最接近的 4 倍数
。依此类推。
You are running into the limits of precision of IEEE 32 bit floating point numbers.
There are only 23 bits of fractional mantissa plus an implied '1' at the beginning.
So the largest integer that can be exactly represented is 224 = 16777216
You would need 27 bits to exactly represent the product of 10001*10001 = 100020001.
Once you go above 224, you only get the nearest even integer.
Once you go above 225, you only get the nearest multiple of 4.
And so on.