访问 CUDA 中的结构成员?
为了简化起见,假设我有这个结构:
Cuda 中的 someheader.h
typedef struct
{
float x
}someStruct;
,如果该结构由 C++ 应用程序共享,设备函数将如何访问该结构的成员?
例如:
__global__ void stuff( someStruct *g ) {
g[0].x = 0.4f;
}
这是正确的方法吗?它似乎不起作用。
For simpliflication, say I have this struct:
someheader.h
typedef struct
{
float x
}someStruct;
in Cuda, how will a device function access the member of the struct if the struct is being shared by a C++ application?
For example:
__global__ void stuff( someStruct *g ) {
g[0].x = 0.4f;
}
is that the correct way to do it? It doesn't seem to be working.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要将结构数组传输到 GPU。
例如要访问内核内的浮点数组,您将需要执行以下操作
<代码>
这当然是基础知识。您可以用任何数据类型替换 float 并获得相同的行为。结构只不过是用户定义的数据类型。
这与发送单个浮点(因此是单个结构)不同,因为内核的所有输入都会在运行时推送到 GPU 内存系统的某个部分(取决于卡),GPU 可以从那里访问这些值。因此,如果输入是一个结构体,则当内核尝试访问它时,整个结构体驻留在 GPU 上。但是,如果您发送一个指向主机上生成的结构的指针,则 GPU 具有该指针的值,但不具有实际数据。
You will need to transfer the array of structs to the GPU.
for example to access an array of floats inside the kernel, you will need to do the following
This of course is basic knowledge. You could replace float with any data type and get the same behavior. Structs are nothing but user defined data types.
This is different from sending in a single float (and hence a single struct), because all the inputs to a kernel are pushed into some part of GPU memory system at run time (depending on the card) from where GPU can access these values. So if the input is a struct, the whole struct resides on the GPU when a kernel is trying to access it. But if you send in a pointer to structs which were generated on the host, the GPU has value of the pointer but not the actual data.
结构本身是一个抽象实体,在主机或设备端没有物理表示。
设备端对象的内存布局与主机端完全相同(如果这就是您真正要求的),因此您可以安全地将大结构对象从主机复制到设备,反之亦然。
访问对象的成员只不过是在编译时计算正确的偏移量并在运行时将其添加到对象指针 (
this
)。 CUDA 完全有能力做到这一点。someStruct.x=2
将翻译成汇编语言中类似的内容:其中
0
是结构体中成员x
的偏移量。更新:
主机和设备内存完全独立(一个位于 RAM 中,另一个位于 GPU 中)。没有任何内容是共享的,所有内容都必须来回发送(这可能非常耗时)。使用 CudaMemcpy 函数复制数据。
Struct itself is an abstract entity and has no physical representation on host or device side.
The memory layout of an object on device side is exactly the same as on the host side (if that's what you are really asking), so you can safetly copy big struct objects from host to device and vice versa.
Accessing a member of an object is nothing else than computing a correct offset at compile time and adding it to the object pointer (
this
) at run-time. CUDA is perfectly capable of doing that.someStruct.x=2
will translate into something like this in assembly language:where
0
is the offset of memberx
inside your struct.Update:
Host and device memory are complete separate (one is in your RAM, the other on your GPU). Nothing is shared, everything has to be send back and forth (which can be quite time-consuming). Use CudaMemcpy function to copy data.