Opencl内核和传统循环

发布于 2025-02-12 08:08:06 字数 716 浏览 1 评论 0原文

我正在研究OpenCL,但我不了解C/C ++代码和内核代码中传统循环之间的关系。 只是为了清楚这种情况:

”示例

所以我的问题是:在传统循环中,我有n actible as我的边界在内核代码中没有它,但是我有get_global_id(0)表示我数组的内存范围,这意味着我从0开始,然后迭代直至get_global_id与数组的最大大小相匹配,在这种情况下,n?还是有所不同?

因为在另一个示例中,我不知道如何编写相应的内核代码

”

我希望我的问题很清楚,因为我的英语不太好, 对不起。

在此先感谢您的帮助,如果有问题,请告诉我!

I'm studying OpenCL and I don't understand the relationship between traditional loop in a C/C++ code and kernel code.
Just for be clear a situation like that:

Example

So my question is: In the traditional loops I have n variable as my boundary while in kernel code I don't have it but I have get_global_id(0) that indicates the memory scope of my array, this means that I start from 0, and iterate until get_global_id matches with the maximum size of the array, n in this case? Or is something different?

Because in this other example I don't know how to write the correspond kernel code

Example2

I hope my question is clear because I'm not very well in English, sorry.

Thanks in advance for the help, if there are problems let me know!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

游魂 2025-02-19 08:08:07

OPENCL内核的编码为前循环的单个迭代,但是所有迭代均与随机顺序并行运行。

i = 0..n-1 ,您将一个接一个地添加向量的每个元素

for(int i=0; i<N; i++) { // loop index i
    C[i] = A[i]+B[i]; // compute one after the other
}

考虑c ++中的此向量加法示例,其中for-loop,但作为内核关键字和所有向量作为参数的函数:

kernel void add_kernel(const global float* A, const global float* B, global float* C) {
    const int i = get_global_id(0);
    C[i] = A[i]+B[i]; // compute all loop indices i in parallel
}

您可能想知道:n在哪里?您将n作为其“全局范围”作为C ++方面的内核,因此内核知道要并行计算的元素i

因为在OpenCL内核中,每个迭代都并行运行,因此一定没有任何数据依赖性从一个迭代到下一个迭代。否则,您必须使用双缓冲区(仅从一个缓冲区读取,只写入另一个缓冲区)。在您的第二个示例中,a [i] = b [i-1]+b [i]+b [i+1]您做到了:仅从b ,仅写入a。具有定期边界的实现可以完成,请参见在这里

An OpenCL kernel is coded like a single iteration of a for-loop, but all iterations are run in parallel with random order.

Consider this vector addition example in C++, where for i=0..N-1, you add each element of the vectors one after the other:

for(int i=0; i<N; i++) { // loop index i
    C[i] = A[i]+B[i]; // compute one after the other
}

In OpenCL, the vector addition looks like the inside of this for-loop, but as a function with the kernel keyword and all vectors as parameters:

kernel void add_kernel(const global float* A, const global float* B, global float* C) {
    const int i = get_global_id(0);
    C[i] = A[i]+B[i]; // compute all loop indices i in parallel
}

You might be wondering: Where is N? You give N to the kernel on the C++ side as its "global range", so the kernel knows how much elements i to calculate in parallel.

Because in the OpenCL kernel every iteration runs in parallel, there must not be any data dependencies from one iteration to the next; otherwise you have to use a double buffer (only read from one buffer and only write to the other). In your second example with A[i] = B[i-1]+B[i]+B[i+1] you do exactly that: only read from B, only write to A. The implementation with periodic boundaries can be done branch-less, see here.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文