CUDA:嵌入式for循环内核
我有一些代码想要制作成 cuda 内核。看吧:
for (r = Y; r < Y + H; r+=2)
{
ch1RowSum = ch2RowSum = ch3RowSum = 0;
for (c = X; c < X + W; c+=2)
{
chan1Value = //some calc'd value
chan3Value = //some calc'd value
chan2Value = //some calc'd value
ch2RowSum += chan2Value;
ch3RowSum += chan3Value;
ch1RowSum += chan1Value;
}
ch1Mean += ch1RowSum / W;
ch2Mean += ch2RowSum / W;
ch3Mean += ch3RowSum / W;
}
是否应该将其分为两个内核,一个用于计算 RowSums,另一个用于计算 Means,以及我应该如何处理我的循环索引不从零开始并以 N 结束这一事实?
I have some code that I want to make into a cuda kernel. Behold:
for (r = Y; r < Y + H; r+=2)
{
ch1RowSum = ch2RowSum = ch3RowSum = 0;
for (c = X; c < X + W; c+=2)
{
chan1Value = //some calc'd value
chan3Value = //some calc'd value
chan2Value = //some calc'd value
ch2RowSum += chan2Value;
ch3RowSum += chan3Value;
ch1RowSum += chan1Value;
}
ch1Mean += ch1RowSum / W;
ch2Mean += ch2RowSum / W;
ch3Mean += ch3RowSum / W;
}
Should this be split up into two kernels, one to calculate the RowSums and one to calculate the Means, and how should I handle the fact that my loop indices dont start at zero and end at N?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设您有一个计算这三个值的内核。配置中的每个线程都会计算每个 (r,c) 对的三个值。
我不相信您可以在上述内核中计算总和(至少完全并行)。您将无法像上面那样使用 += 。如果每个块(行)中只有一个线程执行求和和平均值,则可以将其全部放入一个内核中,就像这样......
但最好使用第一个值内核,然后使用第二个内核来求和和意味着......可以进一步并行化下面的内核,如果它是独立的,那么您可以在准备好时专注于它。
Let's suppose you have a kernel that computes the three values. Each thread in your configuration will compute the three values for each (r,c) pair.
I don't believe you can calculate the sum (completely in parallel, at least) in the above kernel. You won't be able to use += like you have above. You could put it all in one kernel if you have only one thread in each block (row) do the sum and mean, like this...
but it's probably better to use the first value kernel and then a second kernel for both sums and means... It's possible to further parallelize the kernel below, and if it's separate you can focus on that when you're ready.