我应该创建多个 OpenCL 内核以避免条件语句吗?

发布于 2024-10-01 04:58:26 字数 129 浏览 4 评论 0原文

在 OpenCL 中,我有一个需要对复杂且真实的数据进行操作的内核。我可以放入一个条件语句来调用正确的代码行来处理这个问题,或者我可以有两个调用的内核并将条件语句推送到我的调用代码中。

这显然不利于可维护性,但是它对性能重要吗?

In OpenCL, I have a kernel that needs to operate on complex and real data. I could put a conditional statement in that calls the right line of code to handle this, or I could have two kernels that I call and push the conditional statement out to my calling code.

This obviously is bad for maintainability, but is it significant for performance?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

傾城如夢未必闌珊 2024-10-08 04:58:26

如果只是一个条件语句,根据我的经验,性能差异绝对可以忽略不计,至少在 NVidia 硬件上是这样。

基本上,只要所有(或大多数)工作项遵循相同的代码路径,就可以了。由于所采用的代码路径取决于您的情况下的内核参数,因此所有工作项都遵循相同的路径。

If it's just one conditional statement, in my experience the performance difference is absolutely negligible, at least on NVidia hardware.

Basically, as long as all (or most) work-items follow the same code path, you're fine. As the code path taken depends on a kernel argument in your case, all work-items follow the same path.

烟花易冷人易散 2024-10-08 04:58:26

稍微取决于条件在哪里。首先是代码的可读性,然后是在测量代码并发现这是一个问题后的性能,

例如。 kernel_for_RGB_image 和 kernel_for_ABGR_image 似乎是一个合理的使用,不同的内核有效地展开一些深层的内部循环可能是一个更大的维护难题。

Depends slightly on where the conditional is. Code for readability first, then performance after you have measured it AND found that it's a problem

eg. kernel_for_RGB_image and kernel_for_ABGR_image seems like a reasonable use, different kernels to effectively unroll some deep inner loop might be a bigger maintenance headache.

岛歌少女 2024-10-08 04:58:26

我认为最好的方法是实际尝试和基准测试两个变体。在某些情况下,编译多个条件块,即使只执行其中一个,也会导致性能下降。原因是 GPR(通用寄存器):编译器根据最坏情况的需要分配尽可能多的寄存器。

我可以建议这样的解决方案:使用单个内核函数,但具有编译时条件:

__kernel void work()
{
#if VAR
    // one code
#else
    // another code
#endif
}

然后您需要重新编译内核,并将 true/false 设置为 改变条件时的 VAR。显然,对于编译器来说,它与两个内核没有区别,但如果这些内核的部分代码相同,那么维护可能会更好。

I think that the best way is to actually try and benchmark two variants. In some cases having multiple conditional blocks compiled, even if only one of them is executed, can lead to worse performance. The reason is GPRs (general purpose registers): the compiler allocates as many registers, as are needed for worst case.

I can suggest such a solution: have a single kernel function, but with compile-time conditional:

__kernel void work()
{
#if VAR
    // one code
#else
    // another code
#endif
}

Then you need to recompile the kernel with true/false set to VAR when changing condition. Obviously, for the compiler it doesn't differ from two kernels, but for maintaining may be better, if a part of code is the same for those kernels.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文