CUDA 原子参数竞争条件
如果存在如下结构的设备代码
Item* prev_entry = array[entry->prev];
prev_entry->next = entry->next;
并且将其重写为原子操作
atomicExch(&(array[entry->prev]->next),entry->next);
是array
的内存访问与 next
的访问一起以原子方式完成?可能有其他线程修改 entry->prev
(因为它们可能是另一个 Item 的下一个值),并且如果数组访问是非原子地完成的,则 entry->prev
code> 可能会在访问数组和对地址 next
执行原子操作之间发生变化,从而导致不正确的结果。
更笼统地概括这个问题,原子操作参数中的所有操作都是原子执行的吗?
If there is device code structured as follows
Item* prev_entry = array[entry->prev];
prev_entry->next = entry->next;
And it were rewritten as an atomic operation
atomicExch(&(array[entry->prev]->next), entry->next);
is the memory access of array
done atomically along with the access of next
? There may be other threads that modify entry->prev
(as they may be another Item's next value) and if the array access is done non-atomically then entry->prev
may change between accessing the array and the execution of the atomic operation on the address next
resulting in an incorrect result.
To frame the question more generally, are all operations within an atomic operation's arguments executed atomically?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不,不是。如果你研究一下相应的SASS代码,你会发现这里的
entry->next
相关的读操作是一个普通的读操作,没有受到任何保护。该读取操作将原子“更新值”放入寄存器中。另一个寄存器保存要更新的地址。原子操作对这些寄存器起作用(如果相关,则将其结果返回到另一个寄存器中)。
这是一个例子:
首先,我们注意到原子操作纯粹基于寄存器来工作:
寄存器RZ是“目的地”,它是RZ(始终为零的寄存器,充当“丢弃”寄存器),因为我们不要求函数的返回值。寄存器对R2、R3包括要自动“更新”的位置的64位地址(在这种情况下,替换其值),并且替换值包含在R4中。向后看,我们看到 R4 被加载在这里:
它与原子完全分开。这是“普通”负载。您可以向后推,发现包含加载地址的寄存器对 R4、R5 填充了常量内存中的内核参数,使用
SR_TID.X
寄存器进行偏移(对应于threadIdx .x
)这是有道理的。同样,包含原子更新位置地址的 R2、R3 寄存器对直接从内核参数加载,没有偏移量,这也是有意义的。
No it is not. If you study the corresponding SASS code, you will discover that the read operation associated with
entry->next
here:is an ordinary read operation, not protected in any way. That read operation puts the atomic "update value" in a register. Another register holds the address to update. The atomic operation works on those registers (returning its result in another register, if relevant).
Here is an example:
First of all, we note that the atomic operation works purely based on registers:
The register RZ is the "destination", it is RZ (the always-zero register, acting as a "discard" register) because we are not asking for the return value of the function. The register pair R2,R3 comprises the 64 bit address of the location to atomically "update" (in this case, replace its value), and the replacement value is contained in R4. Working backward we see that R4 was loaded here:
which is completely separate from the atomic. That is an "ordinary" load. You can push backward to find out that the register pair R4,R5 which contains the load address is populated with the kernel argument from constant memory, offset using the
SR_TID.X
register (corresponding tothreadIdx.x
) which makes sense.Likewise the R2,R3 register pair, containing the address of the atomic update location, is loaded directly from kernel arguments, with no offset, which also makes sense.