CUDA:什么是分散写?

发布于 2024-08-08 07:52:46 字数 65 浏览 3 评论 0原文

CUDA SDK中的各种CUDA演示都提到“分散写入”。这些零散的文字是什么?为什么它如此伟大?与此相反的是什么呢?

Various CUDA demos in the CUDA SDK refer to "scattered write". What is this scattered write and why is it so great? In contrast to what does it stand?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

自此以后,行同陌路 2024-08-15 07:52:46

我将在这里使用 CUDA 的术语。

分散写入意味着您从每个cuda线程写入任意地址(即,您的warp线程不会写入连续内存,例如)。它与帧缓冲区写入形成对比,帧缓冲区写入是二维一致的,并且可以由硬件合并。直到不久前,这些还是 GPU 唯一可用的写入操作。

它们是收集读取的相反操作,收集读取从分散的位置读取数据,并在对收集的数据以 SIMD 方式执行线程扭曲之前收集所有数据。然而,长期以来,GPU 上都可以通过任意纹理获取来进行收集读取。

I'm going to use CUDA's terminology here.

scattered write means that you're writing from each cuda thread to an arbitrary address (ie. the threads of your warp will not write in consecutive memory, e.g.). It contrasts with frame-buffer writes, which are 2d-coherent, and can be coalesced by the hardware. Those were the only writes available to GPUs until not so long ago.

They are the opposite operation of a gather read, which reads data from scattered location, and gathers all of them prior to the warp of threads executing in a SIMD fashion on the gathered data. However, gather reads have long been available on GPUs through arbitrary texture fetches.

情深已缘浅 2024-08-15 07:52:46

分散写入很棒,因为它允许您写入任何内存地址。以前的着色器实现通常仅限于给定着色器程序可以写入的内存地址。

“图形 API 中的片段程序仅限于在预先指定的位置输出 32 个浮点数(RGBA * 8 渲染目标),而 CUDA 支持分散写入,即对任何地址进行无限数量的存储。这使得许多新算法成为可能。可以使用图形 API 来使用 CUDA 高效执行”

来自 CUDA 常见问题解答:

http://forums.nvidia.com/index.php?s=fd8a3833d78a50e273c5c731476eed0d&showtopic=84440&pid=478583&start=0&#entry478583

基本上它使CUDA程序更容易编写,因为它们不受可以编写结果的位置的限制。请记住,在 GPU 上获得良好性能的关键之一是利用内存局部性。通过大量写入全局内存来过度使用分散写入很可能会影响您的性能。

Scattered write is great because it allows you to write to any memory address. Previous shader impementations were usually limited in the memory addresses which a given shader program could write to.

"Whereas fragment programs in graphics APIs are limited to outputting 32 floats (RGBA * 8 render targets) at a pre-specified location, CUDA supports scattered writes - i.e. an unlimited number of stores to any address. This enables many new algorithms that were not possible using graphics APIS to perform efficiently using CUDA"

From the CUDA FAQ:

http://forums.nvidia.com/index.php?s=fd8a3833d78a50e273c5c731476eed0d&showtopic=84440&pid=478583&start=0&#entry478583

Basically it makes CUDA programs easier to write because they aren't as limited by where they can write results. Bear in mind that one of the keys to getting good performance on a GPU is exploiting memory locality. Overusing scattered writes by writing to global memory a lot will most likely impact your performance.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文