OpenCL 原子添加操作的奇怪行为

发布于 2024-12-13 01:52:42 字数 720 浏览 3 评论 0原文

对于一个项目,我必须深入研究 OpenCL:一切进展顺利,只是现在我需要原子操作。 我正在 Nvidia GPU 上使用最新的驱动程序执行 OpenCL 代码。 clGetDeviceInfo() 查询 CL_DEVICE_VERSION 返回我: OpenCL 1.0 CUDA,因此我想我必须参考 OpenCL 1.0 规范。

我开始在内核中的 __global int* vnumber 缓冲区上使用 atom_add 操作: atom_add(&vnumber[0], 1);。这给了我明显错误的结果。因此,作为额外的检查,我将 add 指令移到了内核的开头,以便为每个线程执行它。当内核以 512 x 512 线程启动时,vnumber[0] 的内容为:524288,正好是 2 x 512 x 512,是该值的两倍我应该得到。有趣的是,通过将添加操作更改为 atom_add(&vnumber[0], 2);,返回值是 65536,又是我应该的两倍得到。

有人已经经历过类似的事情吗?我错过了一些非常基本的东西吗?我已经检查了数据类型的正确性,但看起来没问题(我正在使用 *int 缓冲区,并使用 sizeof(cl_int) 分配它)。

For a project, I had to dive into OpenCL: things are going fairly well except now that I need atomic operations.
I'm executing the OpenCL code on top of an Nvidia GPU, with the last drivers. clGetDeviceInfo() querying CL_DEVICE_VERSION returns me:
OpenCL 1.0 CUDA, hence I guess I have to refer to the OpenCL 1.0 specs.

I started using an atom_add operation in my kernel on a __global int* vnumber buffer:
atom_add(&vnumber[0], 1);. This gave me clearly wrong results. Thus, as an additional check, I have moved the add instruction at the beginning of the kernel, so that it is executed for each thread. When the kernel is launched with 512 x 512 threads, the content of vnumber[0] is: 524288, which is exactly 2 x 512 x 512, two times the value that I should get. The funny thing is that by changing the add operation to atom_add(&vnumber[0], 2);, the returned value is 65536, again two times what I should get.

Did someone already experienced something similar? Am I missing something very basic? I have checked the correctness of data types but it seems ok (I'm using *int buffer, and allocating it with sizeof(cl_int)).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

巷子口的你 2024-12-20 01:52:43

您正在使用atom_add,它是本地内存的OpenCL 1.0 扩展。然而你正在传递它的全球记忆。相反,请尝试 OpenCL 1.1 的atomic_add,它适用于全局内存。

You are using atom_add, which is an OpenCL 1.0 extension for local memory. Yet you are passing it global memory. Instead, try OpenCL 1.1's atomic_add, which works with global memory.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文