CUDA 原子参数竞争条件

发布于 2025-01-13 09:17:14 字数 509 浏览 4 评论 0原文

如果存在如下结构的设备代码

Item* prev_entry = array[entry->prev];
prev_entry->next = entry->next;

并且将其重写为原子操作

atomicExch(&(array[entry->prev]->next),entry->next);

是array 的内存访问与 next 的访问一起以原子方式完成？可能有其他线程修改 entry->prev （因为它们可能是另一个 Item 的下一个值），并且如果数组访问是非原子地完成的，则 entry->prev code> 可能会在访问数组和对地址 next 执行原子操作之间发生变化，从而导致不正确的结果。

更笼统地概括这个问题，原子操作参数中的所有操作都是原子执行的吗？

原文

If there is device code structured as follows

Item* prev_entry = array[entry->prev];
prev_entry->next = entry->next;

And it were rewritten as an atomic operation

atomicExch(&(array[entry->prev]->next), entry->next);

is the memory access of array done atomically along with the access of next? There may be other threads that modify entry->prev (as they may be another Item's next value) and if the array access is done non-atomically then entry->prev may change between accessing the array and the execution of the atomic operation on the address next resulting in an incorrect result.

To frame the question more generally, are all operations within an atomic operation's arguments executed atomically?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

樱＆纷飞 2025-01-20 09:17:14

数组的内存访问是否与下一个的访问一起以原子方式完成？

不，不是。如果你研究一下相应的SASS代码，你会发现这里的entry->next相关的读操作

atomicExch(&(array[entry->prev]->next), entry->next);

是一个普通的读操作，没有受到任何保护。该读取操作将原子“更新值”放入寄存器中。另一个寄存器保存要更新的地址。原子操作对这些寄存器起作用（如果相关，则将其结果返回到另一个寄存器中）。

这是一个例子：

$ cat t1983.cu
__global__ void k(int *al, int *d){

  atomicExch(al, d[threadIdx.x]);
}


$ nvcc -c t1983.cu
$ cuobjdump -sass ./t1983.o

Fatbin elf code:
================
arch = sm_52
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_52
                Function : _Z1kPiS_
        .headerflags    @"EF_CUDA_SM52 EF_CUDA_PTX_SM(EF_CUDA_SM52)"
                                                                           /* 0x001c7c00fe0007f6 */
        /*0008*/                   MOV R1, c[0x0][0x20] ;                  /* 0x4c98078000870001 */
        /*0010*/         {         MOV R2, c[0x0][0x140] ;                 /* 0x4c98078005070002 */
        /*0018*/                   S2R R4, SR_TID.X         }
                                                                           /* 0xf0c8000002170004 */
                                                                           /* 0x001fc800fec20ff1 */
        /*0028*/                   SHR.U32 R0, R4.reuse, 0x1e ;            /* 0x3828000001e70400 */
        /*0030*/                   ISCADD R4.CC, R4, c[0x0][0x148], 0x2 ;  /* 0x4c18810005270404 */
        /*0038*/                   IADD.X R5, R0, c[0x0][0x14c] ;          /* 0x4c10080005370005 */
                                                                           /* 0x041fc400fe8007b1 */
        /*0048*/                   LDG.E R4, [R4] ;                        /* 0xeed4200000070404 */
        /*0050*/                   MOV R3, c[0x0][0x144] ;                 /* 0x4c98078005170003 */
        /*0058*/                   ATOM.E.EXCH RZ, [R2], R4 ;              /* 0xed810000004702ff */
                                                                           /* 0x001ffc00ffe007ed */
        /*0068*/                   NOP ;                                   /* 0x50b0000000070f00 */
        /*0070*/                   EXIT ;                                  /* 0xe30000000007000f */
        /*0078*/                   BRA 0x78 ;                              /* 0xe2400fffff87000f */
                ..........

首先，我们注意到原子操作纯粹基于寄存器来工作：

   ATOM.E.EXCH RZ, [R2], R4 ;

寄存器RZ是“目的地”，它是RZ（始终为零的寄存器，充当“丢弃”寄存器），因为我们不要求函数的返回值。寄存器对R2、R3包括要自动“更新”的位置的64位地址（在这种情况下，替换其值），并且替换值包含在R4中。向后看，我们看到 R4 被加载在这里：

     LDG.E R4, [R4] ;

它与原子完全分开。这是“普通”负载。您可以向后推，发现包含加载地址的寄存器对 R4、R5 填充了常量内存中的内核参数，使用 SR_TID.X 寄存器进行偏移（对应于 threadIdx .x）这是有道理的。

同样，包含原子更新位置地址的 R2、R3 寄存器对直接从内核参数加载，没有偏移量，这也是有意义的。

is the memory access of array done atomically along with the access of next?

No it is not. If you study the corresponding SASS code, you will discover that the read operation associated with entry->next here:

atomicExch(&(array[entry->prev]->next), entry->next);

is an ordinary read operation, not protected in any way. That read operation puts the atomic "update value" in a register. Another register holds the address to update. The atomic operation works on those registers (returning its result in another register, if relevant).

Here is an example:

$ cat t1983.cu
__global__ void k(int *al, int *d){

  atomicExch(al, d[threadIdx.x]);
}


$ nvcc -c t1983.cu
$ cuobjdump -sass ./t1983.o

Fatbin elf code:
================
arch = sm_52
code version = [1,7]
producer = <unknown>
host = linux
compile_size = 64bit

        code for sm_52
                Function : _Z1kPiS_
        .headerflags    @"EF_CUDA_SM52 EF_CUDA_PTX_SM(EF_CUDA_SM52)"
                                                                           /* 0x001c7c00fe0007f6 */
        /*0008*/                   MOV R1, c[0x0][0x20] ;                  /* 0x4c98078000870001 */
        /*0010*/         {         MOV R2, c[0x0][0x140] ;                 /* 0x4c98078005070002 */
        /*0018*/                   S2R R4, SR_TID.X         }
                                                                           /* 0xf0c8000002170004 */
                                                                           /* 0x001fc800fec20ff1 */
        /*0028*/                   SHR.U32 R0, R4.reuse, 0x1e ;            /* 0x3828000001e70400 */
        /*0030*/                   ISCADD R4.CC, R4, c[0x0][0x148], 0x2 ;  /* 0x4c18810005270404 */
        /*0038*/                   IADD.X R5, R0, c[0x0][0x14c] ;          /* 0x4c10080005370005 */
                                                                           /* 0x041fc400fe8007b1 */
        /*0048*/                   LDG.E R4, [R4] ;                        /* 0xeed4200000070404 */
        /*0050*/                   MOV R3, c[0x0][0x144] ;                 /* 0x4c98078005170003 */
        /*0058*/                   ATOM.E.EXCH RZ, [R2], R4 ;              /* 0xed810000004702ff */
                                                                           /* 0x001ffc00ffe007ed */
        /*0068*/                   NOP ;                                   /* 0x50b0000000070f00 */
        /*0070*/                   EXIT ;                                  /* 0xe30000000007000f */
        /*0078*/                   BRA 0x78 ;                              /* 0xe2400fffff87000f */
                ..........

First of all, we note that the atomic operation works purely based on registers:

   ATOM.E.EXCH RZ, [R2], R4 ;

The register RZ is the "destination", it is RZ (the always-zero register, acting as a "discard" register) because we are not asking for the return value of the function. The register pair R2,R3 comprises the 64 bit address of the location to atomically "update" (in this case, replace its value), and the replacement value is contained in R4. Working backward we see that R4 was loaded here:

     LDG.E R4, [R4] ;

which is completely separate from the atomic. That is an "ordinary" load. You can push backward to find out that the register pair R4,R5 which contains the load address is populated with the kernel argument from constant memory, offset using the SR_TID.X register (corresponding to threadIdx.x) which makes sense.

Likewise the R2,R3 register pair, containing the address of the atomic update location, is loaded directly from kernel arguments, with no offset, which also makes sense.

回复收藏 0 原文

~没有更多了~