opencl中的工作项目分支分歧,它如何工作?

发布于 2025-02-11 19:07:42 字数 382 浏览 1 评论 0原文

我正在研究有关OpenCL的信息,但我对“工作 - 项目差异或发散控制流程”的概念不太了解。

正如我们在下面的图中看到的那样,有一些翘曲或波前,取决于GPU的模型执行一项指令或另一个指令。

现在,我的问题是:所有扭曲/波前将执行IF条件,然后执行其他条件,或者仅作为程序的正常控制流程(仅IF或仅是其他条件)。

这个问题可能非常愚蠢,但是在网上,我什么都没找到,而有了其他材料,我不明白这一点。

I'm studying something about OpenCL and I don't understand very well the concept of "work-item divergence or Divergent Control Flow".

As we can see in the picture below, there are some warp or wavefront, depends of the model of the GPU that executes one instruction or another instruction.

Example

Now, my question is: all the warp/wavefront will execute the if condition and later the else condition or only one of these (only the if or only the else) as a normal control flow of a program.

This question can be very stupid, but on the web, I didn't find anything and with other material, I don't understand the point.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

挖鼻大婶 2025-02-18 19:07:43

理解GPU风格的SIMD执行模型的关键是,Wavefront/simd组中的所有线程始终同时执行完全相同的指令。如果线程不需要运行指令至少一个其他线程必须执行,不会有任何副作用(寄存器值不会更改等),但是在性能方面,它仍然像确实运行它一样成本。

如果分支条件为 所有螺纹的分支条件是Wavefront/simd组中的所有线程,则所有线程仅运行一个分支,然后跳过另一个分支。因此,如果工作负载中几乎所有线程的条件相同,或者您可以安排一个组中的所有线程的条件相同,则您不支付分歧成本。 (或它变得可忽略。)

如果 在组内频繁的差异,则整个波前需要执行两个分支。发生这种情况时,不需要实际运行代码的线程仍将逐步浏览其他线程所要求的指令,与其他线程完全相同,它只是没有效果。与硬件CPU线程不同,GPU线程不能与其他线程(在同一SIMD组中)运行不同的代码,它只能在不同的数据上运行相同的代码,或者必须等到其他线程完成代码完成代码不需要运行。

The key to understanding the GPU-style SIMD execution model is that all threads in a wavefront/SIMD group always execute the exact same instruction at the same time. If a thread doesn't need to run an instruction that at least one other thread must execute, there won't be any side effects (register values won't change, etc.), but it still costs as much in terms of performance as if it really did run it.

If the branching condition is either true or false for all threads in a wavefront/SIMD group, then all threads only run the one branch, and the other branch is skipped. So if the condition is the same for almost all threads in your workload, or if you can arrange for the condition to be the same for all threads in a group, then you don't pay the divergence cost. (Or it becomes negligible.)

If there is a frequent divergence within the group, the whole wavefront needs to execute both branches. When this happens, the threads which don't need to actually run the code, will still step through those instructions required by the other threads at exactly the same time as those other threads, it just has no effect. Unlike hardware CPU threads, a GPU thread can't run different code from other threads (in the same SIMD group), it can only run the same code on different data, or it has to wait until the other threads have finished the code it doesn't need to run.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文