获取CUDA纹理问题

发布于 2024-11-03 05:36:55 字数 1555 浏览 1 评论 0原文

我在获取浮动纹理时遇到问题。纹理定义如下:

texture<float, 2, cudaReadModeElementType> cornerTexture;

绑定和参数设置为:

cornerTexture.addressMode[0]    = cudaAddressModeClamp;
cornerTexture.addressMode[1]    = cudaAddressModeClamp;
cornerTexture.filterMode        = cudaFilterModePoint;
cornerTexture.normalized        = false;
cudaChannelFormatDesc cornerDescription = cudaCreateChannelDesc<float>();


cudaBindTexture2D(0, &cornerTexture, cornerImage->imageData_device, &cornerDescription, cornerImage->width, cornerImage->height, cornerImage->widthStep);

heightwidth 是按元素数量表示的两个维度的大小。 widthStep 是指字节数。内核内访问的发生方式如下:

thisValue = tex2D(cornerTexture, thisPixel.x, thisPixel.y);
printf("thisPixel.x: %i thisPixel.y: %i thisValue: %f\n", thisPixel.x, thisPixel.y, thisValue);

thisValue 应始终为非负浮点数。 printf() 给了我奇怪的、无用的值,这些值与线性内存实际存储的值不同。我尝试在两个坐标上使用 0.5f 来偏移访问,但它给出了相同的错误结果。

有什么想法吗?

更新 似乎存在隐藏的对齐要求。据我推断,传递给 cudaBindTexture 函数的音高需要是 32 字节的倍数。例如,

cudaBindTexture2D(0, &debugTexture, deviceFloats, &debugDescription, 10, 32, 40)

在获取纹理时,以下内容会给出不正确的结果,但以下内容(宽度和高度已切换的同一数组)效果很好:

cudaBindTexture2D(0, &debugTexture, deviceFloats, &debugDescription, 32, 10, 128)

我不确定我是否遗漏了某些内容,或者确实对音高有限制。

更新 2:我已向 Nvidia 提交了错误报告。有兴趣的可以去他们的开发者专区查看,不过我会在这里回复的。

I am having trouble fetching a texture of floats. The texture is defined as follows:

texture<float, 2, cudaReadModeElementType> cornerTexture;

The binding and parameter settings are:

cornerTexture.addressMode[0]    = cudaAddressModeClamp;
cornerTexture.addressMode[1]    = cudaAddressModeClamp;
cornerTexture.filterMode        = cudaFilterModePoint;
cornerTexture.normalized        = false;
cudaChannelFormatDesc cornerDescription = cudaCreateChannelDesc<float>();


cudaBindTexture2D(0, &cornerTexture, cornerImage->imageData_device, &cornerDescription, cornerImage->width, cornerImage->height, cornerImage->widthStep);

height and width are the sizes of the two dimensions in terms of numbers of elements. widthStep is in terms of number of bytes. In-kernel access occurs as follows:

thisValue = tex2D(cornerTexture, thisPixel.x, thisPixel.y);
printf("thisPixel.x: %i thisPixel.y: %i thisValue: %f\n", thisPixel.x, thisPixel.y, thisValue);

thisValue should always be a non-negative float. printf() is giving me strange, useless values that are different from what the linear memory actually stores. I have tried offsetting the access with a 0.5f on both coordinates, but it gives me the same wrong results.

Any ideas?

Update There seems to be a hidden alignment requirement. From what I can deduce, the pitch passed to the cudaBindTexture function needs to be a multiple of 32 bytes. For example, the following gives incorrect results

cudaBindTexture2D(0, &debugTexture, deviceFloats, &debugDescription, 10, 32, 40)

when fetching the texture, but the following (the same array with its width and height switched) works well:

cudaBindTexture2D(0, &debugTexture, deviceFloats, &debugDescription, 32, 10, 128)

I'm not sure whether I'm missing something or there really is a constraint on the pitch.

Update 2: I have filed a bug report with Nvidia. Those who are interested can view it in their developer zone, but I will post the reply back here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

还给你自由 2024-11-10 05:36:55

间距肯定有限制,不幸的是没有设备属性查询来询问 CUDA 它是什么。

但是,如果您使用 cudaMallocPitch() 分配内存并使用传回的音调,则保证可以工作。

There is definitely a constraint on the pitch, and unfortunately there is no device properties query to ask CUDA what it is.

But if you allocate the memory with cudaMallocPitch() and use the pitch passed back, that is guaranteed to work.

方觉久 2024-11-10 05:36:55

Nvidia 对错误报告的回复:

“这里的问题是绑定到 2D 纹理的内存没有适当的对齐限制。纹理内存的基本偏移量和间距都有一定的 HW但是,目前在 CUDA API 中,我们仅将基本偏移限制公开为设备属性,而不是间距限制。

同时,建议应用程序使用 <。 code>cudaMallocPitch() 分配倾斜内存时,以便驱动程序负责满足所有限制。”

Nvidia reply to bug report:

"The problem here is that the memory bound to the 2D texture does not have the proper alignment restrictions. Both the base offset of the texture memory, and the pitch, have certain HW dependant alignment restrictions. However, currently in the CUDA API, we only expose the base offset restriction as a device property, and not the pitch restriction.

The pitch restriction will be addressed in a future CUDA release. Meanwhile, it's recommended that apps use cudaMallocPitch() when allocating pitched memory, so that the driver takes care of satisfying all restrictions."

情归归情 2024-11-10 05:36:55

您是否使用 cudaGetTextureReference 函数获取了与纹理关联的结构?

来自 NVIDIA C 编程指南 3.2 版(第 32 页,最后一段):

将纹理绑定到纹理引用时指定的格式必须与声明纹理引用时指定的参数相匹配;否则,纹理获取的结果未定义

Did you get the structure associated to the texture using the cudaGetTextureReference function?

From version 3.2 of the NVIDIA C Programming Guide (page 32, last paragraph):

The format specified when binding a texture to a texture reference must match the parameters specified when declaring the texture reference; otherwise, the results of texture fetches are undefined.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文