CUDA:不支持未对齐的内存访问:我错过了什么?

发布于 2024-11-02 14:45:15 字数 978 浏览 5 评论 0原文

有一些问题类似对此,但在这种情况下,它有点奇怪; NVCC 3.1 不喜欢这样,但 3.2 和 4.0RC 喜欢;

float xtmp[MAT1];

for (i=0; i<MAT1; i++){
    xtmp[i]=x[p[i]]; //value that should be here
}

其中 p 通过引用来自的函数 (int *p) 传递...

int p_pivot[MAT1],q_pivot[MAT1];

为了添加一些上下文,在 p 到达“顶部”函数之前,它们由(我删除了尽可能多的不相关的内容)填充为了清楚起见,我尽可能编写代码)

...
for (i=0;i<MAT1;i++){
    ...
    p_pivot[i]=q_pivot[i]=i
    ...
}
...

除此之外,枢轴上的唯一操作是具有整数临时值的三步交换。

毕竟 p_pivot 通过 (&p_pivot[0]) 传递给“top”函数

对于任何寻找更多详细信息的人,代码是 此处 在 3.2/4.0 之间切换到更早版本所需的唯一更改是更改 cudaDeviceSynchronise();到 cudaThreadSynchronize();。这是我肮脏的实验代码,所以请不要评判我! :D

如前所述,上述所有内容在更高版本的 NVCC 中都可以正常工作,我正在努力将它们放入有问题的机器上,但我有兴趣看看我缺少什么。

肯定是数组查找索引导致了问题,但我不明白为什么?

There are a few questions similar to this but in this case, its a bit weird; NVCC 3.1 doesn't like this but 3.2 and 4.0RC do;

float xtmp[MAT1];

for (i=0; i<MAT1; i++){
    xtmp[i]=x[p[i]]; //value that should be here
}

Where p is passed by reference to the function (int *p) coming from...

int p_pivot[MAT1],q_pivot[MAT1];

To add a bit of context, before the p's get to the 'top' function, they are populated by (I'm cutting out as much irrelevant code as i can for clarity)

...
for (i=0;i<MAT1;i++){
    ...
    p_pivot[i]=q_pivot[i]=i
    ...
}
...

Beyond that the only operations on pivot are 3-step-swaps with integer temporary values.

After all that p_pivot is passed to the 'top' function by (&p_pivot[0])

For anyone looking for more detail, the code is here and the only change that should be needed to flip between 3.2/4.0 to earlier is to change the cudaDeviceSynchronise(); to cudaThreadSynchronize();. This is my dirty dirty experimental code so please don't judge me! :D

As noted, all of the above works fine in higher versions of NVCC, and I'm working to get those put onto the machine in question, but I'd be interested to see what I'm missing.

It must be the array-lookup indexing that's causing the issue, but I don't understand why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

淡笑忘祈一世凡恋 2024-11-09 14:45:15

对我来说这看起来像是一个编译器错误。这将在 64 位平台上与 nvcc 3.1 一起使用:

float xtmp[MAT1];
//Swap rows (x=Px)
for (i=0; i<MAT1; i++){
    int idx = p[i];
    xtmp[i]=x[idx]; //value that should be here
}

我的猜测是隐式 int 到 size_t 转换中的某些内容正在破坏。我尝试过的任何较新版本的 CUDA 都不会失败。

That looks like a compiler bug to me. This will work with nvcc 3.1 on 64 bit platforms:

float xtmp[MAT1];
//Swap rows (x=Px)
for (i=0; i<MAT1; i++){
    int idx = p[i];
    xtmp[i]=x[idx]; //value that should be here
}

My guess is that something in the implicit int to size_t conversion is breaking. Doesn't fail with any of the newer versions of CUDA I have tried.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文