如何从 OpenCL 代码启动另一个线程?
我的算法由两个步骤组成:
- 数据生成。在此步骤中,我循环生成数据数组作为某些函数结果
- 数据处理。对于这一步,我编写了 OpenCL 内核,用于处理上一步生成的数据数组。
现在第一步在 CPU 上运行,因为它很难并行化。我想在 GPU 上运行它,因为生成的每个步骤都需要一些时间。我想立即对已经生成的数据运行第二步。
我可以在单独的线程中从当前运行的内核运行另一个 opencl 内核吗?或者它在调用者内核的某个线程中运行?
一些伪代码来说明我的观点:
__kernel second(__global int * data, int index) {
//work on data[i]. This process takes a lot of time
}
__kernel first(__global int * data, const int length) {
for (int i = 0; i < length; i++) {
// generate data and store it in data[i]
// This kernel will be launched in some thread that caller or in new thread?
// If in same thread, there are ways to launch it in separated thread?
second(data, i);
}
}
My algorithm consists from two steps:
- Data generation. On this step I generate data array in cycle as some function result
- Data processing. For this step I written OpenCL kernel which process data array generated on previous step.
Now first step runs on CPU because it hard to parallelize. I want to run it on GPU because each step of generation takes some time. And I want to run second step for already generated data immediately.
Can I run another opencl kernel from currently runned kernel in separated thread? Or it be run in the some thread that caller kernel?
Some pseudocode for illustrate my point:
__kernel second(__global int * data, int index) {
//work on data[i]. This process takes a lot of time
}
__kernel first(__global int * data, const int length) {
for (int i = 0; i < length; i++) {
// generate data and store it in data[i]
// This kernel will be launched in some thread that caller or in new thread?
// If in same thread, there are ways to launch it in separated thread?
second(data, i);
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不,OpenCL 没有线程的概念,并且内核执行都不能启动另一个内核。所有内核的执行都是由CPU触发的。
No, OpenCL has no concept of threads, and neither a kernel execution can launch another kernel. All kernel execution is triggered by the CPU.
您应该启动一个内核。
然后执行 clFInish();
然后执行下一个内核。
有更有效的方法,但我只会用事件来扰乱你。
您只需使用第一个内核的内存输出作为第二个内核的输入。这样,您就可以避免 CPU->GPU 复制过程。
You should launch one kernel.
Then do a clFInish();
Then execute the next kernel.
There are more efficient ways but I will only mess you with events.
You just use the memory output of the first kernel as input for the second one. With that, you aboid CPU->GPU copy process.
我相信全局工作大小可以被视为将以一种或另一种方式执行的线程数量。如果我错了请纠正我。
I believe that the global work size might be considered as the number of threads that will be executed, in one way or another. Correct me if I'm wrong.