Opencl内核和传统循环

发布于 2025-02-12 08:08:06 字数 716 浏览 1 评论 0原文

我正在研究OpenCL，但我不了解C/C ++代码和内核代码中传统循环之间的关系。只是为了清楚这种情况：

所以我的问题是：在传统循环中，我有n actible as我的边界在内核代码中没有它，但是我有get_global_id（0）表示我数组的内存范围，这意味着我从0开始，然后迭代直至get_global_id与数组的最大大小相匹配，在这种情况下，n？还是有所不同？

因为在另一个示例中，我不知道如何编写相应的内核代码

我希望我的问题很清楚，因为我的英语不太好，对不起。

在此先感谢您的帮助，如果有问题，请告诉我！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

游魂 2025-02-19 08:08:07

OPENCL内核的编码为前循环的单个迭代，但是所有迭代均与随机顺序并行运行。

i = 0..n-1 ，您将一个接一个地添加向量的每个元素

for(int i=0; i<N; i++) { // loop index i
    C[i] = A[i]+B[i]; // compute one after the other
}

考虑c ++中的此向量加法示例，其中for-loop，但作为内核关键字和所有向量作为参数的函数：

kernel void add_kernel(const global float* A, const global float* B, global float* C) {
    const int i = get_global_id(0);
    C[i] = A[i]+B[i]; // compute all loop indices i in parallel
}

您可能想知道：n在哪里？您将n作为其“全局范围”作为C ++方面的内核，因此内核知道要并行计算的元素i。

因为在OpenCL内核中，每个迭代都并行运行，因此一定没有任何数据依赖性从一个迭代到下一个迭代。否则，您必须使用双缓冲区（仅从一个缓冲区读取，只写入另一个缓冲区）。在您的第二个示例中，a [i] = b [i-1]+b [i]+b [i+1]您做到了：仅从b ，仅写入a。具有定期边界的实现可以完成，请参见在这里。

An OpenCL kernel is coded like a single iteration of a for-loop, but all iterations are run in parallel with random order.

Consider this vector addition example in C++, where for i=0..N-1, you add each element of the vectors one after the other:

for(int i=0; i<N; i++) { // loop index i
    C[i] = A[i]+B[i]; // compute one after the other
}

In OpenCL, the vector addition looks like the inside of this for-loop, but as a function with the kernel keyword and all vectors as parameters:

kernel void add_kernel(const global float* A, const global float* B, global float* C) {
    const int i = get_global_id(0);
    C[i] = A[i]+B[i]; // compute all loop indices i in parallel
}

You might be wondering: Where is N? You give N to the kernel on the C++ side as its "global range", so the kernel knows how much elements i to calculate in parallel.

Because in the OpenCL kernel every iteration runs in parallel, there must not be any data dependencies from one iteration to the next; otherwise you have to use a double buffer (only read from one buffer and only write to the other). In your second example with A[i] = B[i-1]+B[i]+B[i+1] you do exactly that: only read from B, only write to A. The implementation with periodic boundaries can be done branch-less, see here.

回复收藏 0 原文

~没有更多了~