费米cuda双精度针对C
使用 fermi GPU,CPU 和 GPU 双精度结果之间存在小误差。
例如,对于小型测试集,我得到以下绝对错误:(编号 1(CPU) - 编号 2(GPU)) = 3E-018。
以二进制形式,它正如预期的那样非常小......
二进制数 1:
xxxxxxxxxxxx11100000001001
与
二进制数 2:
xxxxxxxxxxxx111100000001010
虽然这是一个二进制数字的差异,但我渴望消除任何差异,因为错误在我的代码中累加。
熟悉费米的人有什么建议吗?如果这是不可避免的,我可以让 C/C++ 模仿费米舍入行为吗?
there is a small error between CPU and GPU double precision results, using a fermi GPU.
e.g. for a small test set, I get the following absolute error for: (Number 1(CPU) - Number 2(GPU)) = 3E-018.
in binary form it is as expected very small…
NUMBER 1 in binary:
xxxxxxxxxxxxx11100000001001
vs
NUMBER 2 in binary:
xxxxxxxxxxxx111100000001010
Although this is a difference of one binary digit, I am keen to eliminate any differences, as the errors addup during my code.
any tips from those familiar with fermi? if this is unavoidable can I get C/C++ to mimic the fermi rounding off behaviour?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您应该看看这篇文章。
浮点不具有关联性,因此如果编译器选择以不同的顺序执行操作,那么您将得到不同的结果。同一编译器的两个版本可能会产生差异!不同的编译器更有可能产生差异,如果您在 GPU 上并行工作(您是,对吧?),那么您本质上是以不同的顺序执行操作...
Fermi 硬件符合 IEEE754-2008 标准,这意味着除了 IEEE754 标准舍入之外,它还具有融合乘加 (FMA) 指令,可避免乘法和加法之间丢失精度。
You should take a look at this post.
Floating point is not associative, so if a compiler chooses to do operations in a different order then you'll get a different result. Two versions of the same compiler can produce differences! Different compilers are even more likely to produce differences, and if you're doing work in parallel on the GPU (you are, right?) then you're inherently doing operations in a different order...
Fermi hardware is IEEE754-2008 compliant, which means that in addition to IEEE754 standard rounding it also has the fused multiply-add (FMA) instruction which avoids losing precision between multiplication and addition.