CUDA 中的 threadfence 内在函数的用途是什么?

发布于 2024-10-20 17:41:00 字数 84 浏览 1 评论 0原文

我浏览了许多论坛帖子和 NVIDIA 文档,但我无法理解 __threadfence() 的作用以及如何使用它。有人可以解释一下该内在函数的目的是什么吗?

I have gone through many forum posts and the NVIDIA documentation, but I couldn't understand what __threadfence() does and how to use it. Could someone explain what the purpose of that intrinsic is?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

暖阳 2024-10-27 17:41:01

通常,不能保证如果一个块向全局内存写入某些内容,另一个块会“看到”它。除了发出写入的块之外,也无法保证全局内存的写入顺序。

有两个例外:

  • 原子操作 - 这些操作始终对其他块可见
  • threadfence

想象一下,一个块生成一些数据,然后使用原子操作来标记数据存在的标志。但另一个块可能在看到该标志后仍然读取不正确或不完整的数据。

__threadfence 函数来帮忙,确保顺序。从其他块可以看出,它之前的所有写入实际上发生在它之后的所有写入之前。

请注意,__threadfence 函数不一定需要停止当前线程,直到其对全局内存的写入对网格中的所有其他线程可见为止。以这种幼稚的方式实现,__threadfence 函数可能会严重损害性能。

例如,如果您执行以下操作:

  1. 存储数据
  2. __threadfence()
  3. 以原子方式标记一个标志

,则可以保证如果另一个块看到该标志,它也会看到该数据。

进一步阅读:Cuda 编程指南,章节 B .5(从版本 11.5 开始)

Normally, there are no guarantee that if one block writes something to global memory, the other block will "see" it. There is also no guarantee regarding the ordering of writes to global memory, with an exception of the block that issued it.

There are two exceptions:

  • atomic operations - those are always visible by other blocks
  • threadfence

Imagine, that one block produces some data, and then uses atomic operation to mark a flag that the data is there. But it is possible that the other block, after seeing the flag, still reads incorrect or incomplete data.

The __threadfence function, coming to the rescue, ensures the ordering. All writes before it really happen before all writes after it, as seen from other blocks.

Note that the __threadfence function doesn't necessarily need to stall the current thread until its writes to global memory are visible to all other threads in the grid. Implemented in this naive way, the __threadfence function could hurt performance severely.

As an example, if you do something like:

  1. store your data
  2. __threadfence()
  3. atomically mark a flag

it is guaranteed that if the other block sees the flag, it will also see the data.

Further reading: Cuda Programming Guide, Chapter B.5 (as of version 11.5)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文