CUDA 中的忙旋转

发布于 2024-12-09 19:47:28 字数 445 浏览 0 评论 0原文

如何实现一种繁忙的自旋机制，

while(variable == 0);

其中变量在发生某个事件后由其他 CUDA 线程更新为 1。

我尝试像上面那样编写它，但代码似乎被忽略了，并且调用线程根本不等待就运行了它。我绝对确定该值为 0，但线程根本不等待。另外，如果我写：

while(variable == 0) __threadfence();

为了不冒缓存变量的风险，即使变量最终设置为 1，线程也会无限期地阻塞。这对我来说是非常奇怪的行为，因为在 CPU 上复制这段代码会产生正确的行为。

编辑：奇怪的是，如果我每个块有 1 个线程，这似乎可以正常工作，但如果我在一个块中有多个线程，则不能正常工作。因此，一个块中的线程可以看到其他块中的线程完成的写入，但看不到同一块中的线程完成的写入。奇怪的...

原文

How can I implement a busy spin mechanism of the form

while(variable == 0);

where variable is updated to 1 by some other CUDA thread after some event has occured.

I tried to just write it like above but the code just seems to get ignored and the calling thread just runs past it without waiting at all. I'm absolutely sure that the value is 0, but the thread does not wait at all.
Also, if I write:

while(variable == 0) __threadfence();

in order to not risk having the variable cached, the thread blocks indefinitely even thought the variable gets set to 1 eventually.
This is all very strange behavior to me, since replicating this code on the CPU produces the correct behavior.

Edit: Oddly, this seems to work correctly if I have blocks of 1 thread each, but not if I have several threads within one block. So threads from one block can see writes done by threads from other blocks, but not writes done by threads from the same block. Strange...

分享到QQ

分享到微博