将cuda。@原子枚举扩展到自定义结构

发布于 2025-02-05 08:17:39 字数 614 浏览 1 评论 0原文

我想知道天气可能会将cuda。@原子操作扩展到自定义类型。

这是我要做的事情的一个示例：

using CUDA

struct Dual
    x
    y
end 

cu0 = CuArray([Dual(1, 2), Dual(2,3)])
cu1 = CuArray([Dual(1, 2), Dual(2,3)])

indexes = CuArray([1, 1])


function my_kernel(dst, src, idx)
   index = threadIdx().x + (blockIdx().x - 1) * blockDim().x

    @inbounds if index <= length(idx)
       CUDA.@atomic dst[idx[index]] = dst[idx[index]] + src[index]
    end
    return nothing
end

@cuda threads = 100 my_kernel(cu0, cu1, indexes)

该代码的问题是cuda。@atomic调用仅支持基本类型 int，浮动或真实。

我需要它与自己的结构一起工作。

如果有人知道这是怎么可能的，那会很好。

原文

i was wondering weather it is possible to extend the CUDA.@atomic operation to a custom type.

Here is an example of what i am trying to do:

using CUDA

struct Dual
    x
    y
end 

cu0 = CuArray([Dual(1, 2), Dual(2,3)])
cu1 = CuArray([Dual(1, 2), Dual(2,3)])

indexes = CuArray([1, 1])


function my_kernel(dst, src, idx)
   index = threadIdx().x + (blockIdx().x - 1) * blockDim().x

    @inbounds if index <= length(idx)
       CUDA.@atomic dst[idx[index]] = dst[idx[index]] + src[index]
    end
    return nothing
end

@cuda threads = 100 my_kernel(cu0, cu1, indexes)

The Problem of this code is that the CUDA.@atomic call only supports basic types like
Int, Float or Real.

I need it to work with my own struct.

Would be nice if someone has an idea how this could be possible.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

雪化雨蝶 2025-02-12 08:17:39

CUDA的基础PTX指令集提供了原子商店的子集，交换，添加/减，增量/减少，最小/最大和比较全局和共享存储器位置（并非所有架构都支持所有操作都与所有架构一起使用所有操作， POD类型，并且有证据表明，并非所有操作都在所有架构上实施）。

所有这些指令的共同点是它们仅在原子上执行一个操作。我完全不熟悉Julia，但是如果是

CUDA.@atomic dst[idx[index]] = dst[idx[index]] + src[index]

“ atmanthys”添加src []。x和src []。y to dst []。x和dst []。y“这是不可能的，因为这意味着在一个原子操作中的单独存储位置上有两个添加。如果您的结构成员可以被包装成兼容类型（例如，一个32位或64位未签名的整数），则可以在CUDA中执行原子商店，交换或比较。但不是算术。

如果您咨询此部分编程指南，您可以使用紧密循环中的比较和集合看到蛮力双精度添加实现的示例。 如果您的结构可以包装成可以用比较和集合来操纵的东西，那么可能会以自定义类型滚动自己的原子添加（最多只有64位）。

在朱莉娅（Julia）中，您如何处理绝对是读者的练习。

The underlying PTX instruction set for CUDA provides a subset of atomic store, exchange, add/subtract,increment/decrement, min/max, and compare-and-set operations for global and shared memory locations (not all architectures support all operations with all POD types, and there is evidence that not all operations are implemented in hardware on all architectures).

What all these instructions have in common is that they execute only one operation atomically. I am completely unfamiliar with Julia, but if

CUDA.@atomic dst[idx[index]] = dst[idx[index]] + src[index]

means "atomically add src[].x and src[].y to dst[].x and dst[].y" then that isn't possible because that implies two additions on separate memory locations in one atomic operation. If the members of your structure could be packed into a compatible type (a 32 bit or 64 bit unsigned integer, for example), you could perform atomic store, exchange or compare-and-set in CUDA. But not arithmetic.

If you consult this section of the programming guide, you can see an example of a brute force double precision add implementation using compare-and-set in a tight loop. If your structure can be packed into something which can be manipulated with compare-and-set, then it might be possible to roll your own atomic add for a custom type (limited to a maximum of 64 bits).

How you might approach that in Julia is definitely an exercise left to the reader.

回复收藏 0 原文

~没有更多了~