将结构传递给 CUDA 内核
我是 CUDA C 的新手,正在尝试将 typedef 结构传递到内核中。当我尝试使用仅包含整数的结构时,我的方法工作得很好,但是当我切换到浮点数时,我得到了无意义的数字作为结果。我认为这与对齐有关,我尝试在类型声明中包含 __align__
,但无济于事。有人可以给我一个如何完成此操作的示例,或者提供替代方法吗?我正在尝试对其进行设置,以便我可以轻松添加或删除字段,而无需更改结构和内核以外的任何内容。我的代码:
typedef struct __align__(8)
{
float a, b;
} point;
__global__ void testKernel(point *p)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
p[i].a = 1.1;
p[i].b = 2.2;
}
int main(void)
{
// set number of points
int numPoints = 16,
gpuBlockSize = 4,
pointSize = sizeof(point),
numBytes = numPoints * pointSize,
gpuGridSize = numPoints / gpuBlockSize;
// allocate memory
point *cpuPointArray = new point[numPoints],
*gpuPointArray = new point[numPoints];
cpuPointArray = (point*)malloc(numBytes);
cudaMalloc((void**)&gpuPointArray, numBytes);
// launch kernel
testKernel<<<gpuGridSize,gpuBlockSize>>>(gpuPointArray);
// retrieve the results
cudaMemcpy(cpuPointArray, gpuPointArray, numBytes, cudaMemcpyDeviceToHost);
printf("testKernel results:\n");
for(int i = 0; i < numPoints; ++i)
{
printf("point.a: %d, point.b: %d\n",cpuPointArray[i].a,cpuPointArray[i].b);
}
// deallocate memory
free(cpuPointArray);
cudaFree(gpuPointArray);
return 0;
}
I'm new to CUDA C, and am trying to pass a typedef'd struct into a kernel. My method worked fine when I tried it with a struct containing only ints, but when I switch to floats I get meaningless numbers back as results. I assume this has to do with alignment, and I tried including __align__
along with my type declaration, but to no avail. Can someone give me an example of how this is done, or provide an alternative approach? I'm trying to set it up so that I can easily add or remove fields without changing anything other than the struct and the kernel. My code:
typedef struct __align__(8)
{
float a, b;
} point;
__global__ void testKernel(point *p)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
p[i].a = 1.1;
p[i].b = 2.2;
}
int main(void)
{
// set number of points
int numPoints = 16,
gpuBlockSize = 4,
pointSize = sizeof(point),
numBytes = numPoints * pointSize,
gpuGridSize = numPoints / gpuBlockSize;
// allocate memory
point *cpuPointArray = new point[numPoints],
*gpuPointArray = new point[numPoints];
cpuPointArray = (point*)malloc(numBytes);
cudaMalloc((void**)&gpuPointArray, numBytes);
// launch kernel
testKernel<<<gpuGridSize,gpuBlockSize>>>(gpuPointArray);
// retrieve the results
cudaMemcpy(cpuPointArray, gpuPointArray, numBytes, cudaMemcpyDeviceToHost);
printf("testKernel results:\n");
for(int i = 0; i < numPoints; ++i)
{
printf("point.a: %d, point.b: %d\n",cpuPointArray[i].a,cpuPointArray[i].b);
}
// deallocate memory
free(cpuPointArray);
cudaFree(gpuPointArray);
return 0;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
由于似乎没有关于如何执行此操作的任何像样的文档,因此我想我应该在此处发布最终的修改后的代码。事实证明 __align__ 部分也是不必要的,实际问题是在尝试打印浮点数时在 printf 中使用 %d 。
Since there doesn't appear to be any decent documentation on how to do this, I thought I'd post the final, revised code here. It turns out that the
__align__
part was unnecessary as well, the actual problem was the use of %d in the printf when trying to print floats.看看它是如何在 CUDA 包含目录中的 vector_types.h 标头中完成的。这应该已经给你一些指示了。
然而,这里的主要问题是
printf
调用中的%d
。您现在尝试打印浮点数,而不是整数。所以那些确实应该使用%f
来代替。Have a look at how it's done in the vector_types.h header that comes in your CUDA include directory. That should already give you some pointers.
However, the main problem here is the
%d
in yourprintf
calls. You're trying to print floats now, not integers. So those really should use%f
instead.