cuda浮点精度

发布于 2024-10-01 05:59:18 字数 415 浏览 1 评论 0原文

有人可以评论一下吗，

我想做一个矢量点积。我的浮点向量是 [2080:2131] 和 [2112:2163]，它们每个包含 52 个元素。

a[52] = {2080 2081 2082 ... ... 2129 2130 2131};
b[52] = {2112 2113 2114 ... ... 2161 2162 2163};

for (int i = 0; i < 52; i++)
{
    sum += a[i]*b[i];
}

我的内核得出的全长（52 个元素）的结果和是 234038032，而 matlab 给出的是 234038038。对于 1 到 9 个元素的乘积和，我的内核结果与 matlab 结果一致。对于10个元素和，它减少1并逐渐增加。结果是可重复的。我检查了所有元素，没有发现任何问题。

原文

Can someone comment on this,

I want to do a vector dot product. My float vector are [2080:2131] and [2112:2163], each one of them contains 52 elements.

a[52] = {2080 2081 2082 ... ... 2129 2130 2131};
b[52] = {2112 2113 2114 ... ... 2161 2162 2163};

for (int i = 0; i < 52; i++)
{
    sum += a[i]*b[i];
}

The result sum for whole length (52 element)was 234038032 by my kernel while matlab gave 234038038. For 1 to 9 element sum of product, my kernel result agrees with matlab result. For 10 element sum, it is off by 1 and gradually increases. The results were reproducible. I checked all the elements and found no problem.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

对你再特殊 2024-10-08 05:59:18

由于向量是浮点数，因此您会遇到舍入错误。 Matlab 将以更高的精度（双精度）存储所有内容，因此不会这么早就看到舍入误差。

您可能需要查看大卫·戈德堡 (David Goldberg) 的《每个计算机科学家都应该了解浮点知识》 - 非常宝贵的读物。

C++ 中的简单演示（即与 CUDA 无关）：

#include <iostream>

int main(void)
{
  float a[52];
  float b[52];
  double c[52];
  double d[52];

  for (int i = 0 ; i < 52 ; i++)
  {
    a[i] = (float)(2080 + i);
    b[i] = (float)(2112 + i);
    c[i] = (double)(2080 + i);
    d[i] = (double)(2112 + i);
  }

  float fsum = 0.0f;
  double dsum = 0.0;
  for (int i = 0 ; i < 52 ; i++)
  {
    fsum += a[i]*b[i];
    dsum += c[i]*d[i];
  }

  std::cout.precision(20);
  std::cout << fsum << " " << dsum << std::endl;
}

运行这个，您会得到：

234038032 234038038

那么您能对此做什么？您可以朝以下几个方向发展...

使用更高的精度：这会影响性能，并且并非所有设备都支持双精度。它也只是推迟问题而不是解决问题，所以我不推荐它！
进行基于树的缩减：您可以结合使用 vectorAdd 和缩减 SDK 示例中的技术。
使用Thrust：非常简单。

Since the vectors are float you are experiencing rounding errors. Matlab will store everything with much higher precision (double) and hence won't see the rounding errors so early.

You may want to check out What Every Computer Scientist Should Know About Floating Point by David Goldberg - invaluable reading.

Simple demo in C++ (i.e. nothing to do with CUDA):

#include <iostream>

int main(void)
{
  float a[52];
  float b[52];
  double c[52];
  double d[52];

  for (int i = 0 ; i < 52 ; i++)
  {
    a[i] = (float)(2080 + i);
    b[i] = (float)(2112 + i);
    c[i] = (double)(2080 + i);
    d[i] = (double)(2112 + i);
  }

  float fsum = 0.0f;
  double dsum = 0.0;
  for (int i = 0 ; i < 52 ; i++)
  {
    fsum += a[i]*b[i];
    dsum += c[i]*d[i];
  }

  std::cout.precision(20);
  std::cout << fsum << " " << dsum << std::endl;
}

Run this and you get:

234038032 234038038

So what can you do about this? There are several directions you could go in...

Use higher precision: this will affect performance and not all devices support double precision. It also just postpones the problem rather than fixing it, so I would not recommend it!
Do a tree based reduction: you could combin the techniques in the vectorAdd and reduction SDK samples.
Use Thrust: very straight-forward.

回复收藏 0 原文

~没有更多了~