CUDA纹理内存绑定全局内存的子部分

发布于 2024-10-29 07:06:51 字数 766 浏览 1 评论 0原文

我在绑定到全局设备内存的子部分纹理内存时遇到问题。

我有一个充满内存的大型全局设备数组，如下所示：

双* device_global;
cudaMalloc((void **)&device_global, sizeof(double)*N));
cudaMemcpy(device_global, 主机, sizeof(double)*N, cudaMemcpyHostToDevice) );

我在 for 循环中运行多个内核。

每个内核都需要 device_global 的一小部分 (int offset = 100)，我通过以下方式将其绑定到纹理：

cudaBindTexture(0、texRef、device_global、channelDesc、sizeof(double)*10);

然而，我面临的问题是，我无法使用指针算术通过循环的偏移量来仅绑定 device_global 的循环部分。

我想做这样的事情：

cudaBindTexture(0, texRef, device_global+ offsett * i , channelDesc, sizeof(double)*10);

应该注意的是，如果偏移量设置为 0，上述方法确实有效，但不知何故，指针算术不起作用。

任何帮助或其他指导方针将不胜感激。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

美人如玉 2024-11-05 07:06:51

将 0 或 NULL 作为 cudaBindTexture 的第一个参数传递是一种不好的做法。 CUDA纹理绑定要求要绑定的指针必须对齐。对齐要求可以通过cudaDeviceProp::textureAlignment设备属性来确定。

cudaBindTexture 可以将任何设备指针绑定到纹理。如果指针未对齐，它将返回距 cudaBindTexture 第一个参数中最近的前面对齐地址的偏移量（以字节为单位）。如果第一个参数为 NULL，则函数调用失败。

绑定应按以下方式完成：

size_t texture_offset = 0;
cudaBindTexture(&texture_offset, texRef, device_global+ offsett * i , channelDesc, sizeof(double)*10);

It's a bad practice to pass 0 or NULL as the first argument of cudaBindTexture. CUDA texture binding requires that the pointer to be bound must be aligned. The alignment requirement can be determined by cudaDeviceProp::textureAlignment device property.

cudaBindTexture can bind any device pointer to the texture. If the pointer is not aligned, it returns an offset in bytes from the nearest preceding aligned address in the first argument of cudaBindTexture. If the first argument is NULL, the function call fails.

Binding should be done as:

size_t texture_offset = 0;
cudaBindTexture(&texture_offset, texRef, device_global+ offsett * i , channelDesc, sizeof(double)*10);

回复收藏 0 原文

倾其所爱 2024-11-05 07:06:51

纹理内存的偏移量必须对齐。您不能仅将内存的任何部分绑定到正确对齐的部分，这是因为内部高性能硬件的工作方式所致。

一种解决方案是使用倾斜内存而不是使用非常小的纹理
有几个大的，每个都从矩阵的对齐行开始。

我在这里猜测，但我认为用作

sizeof(double)*10

纹理内存的数据大小，设置内存本身比读取它需要更多的时间。

总矩阵有多大？

The offset of the Texture Memory must be aligned. You can't bind any portion of the memory only the one that is properly aligned and this is because of how the internal high performance hardware works.

One solution would be to use Pitched Memory and instead of having very small texture
have several big ones each starting at an aligned row of the matrix.

I am guessing here but I think that using