cuda中的直方图计算
该代码不起作用。但是当我在下面的代码中注释 atomicAdd
时,代码就可以工作了。
这是什么原因?
在哪里可以获得浮点数组的直方图代码?
__global__ void calculateHistogram(float *devD, int* retHis)
{
int globalLi = getCurrentThread(); //get the thread ID
if(globalLi>=0 && globalLi<Rd*Cd*Dd)
{
int r=0,c=0,d=0;
GetInd2Sub(globalLi, Rd, Cd, r, c, d); //some calculations to get r,c,d
if(r>=stYd && r<edYd && c>=stXd && c<edXd && d>=stZd && d<edZd)
{
//calculate the histogram
int indexInHis = GetBinNo(devD[globalLi]); //get the bin number in the histogram
atomicAdd(&retHis[indexInHis],1); //when I comment this line the code works
}
}
}
The code does not work. But when I comment atomicAdd
in the following code, the code works.
What is the reason for that?
Where can I get histogram code for float array?
__global__ void calculateHistogram(float *devD, int* retHis)
{
int globalLi = getCurrentThread(); //get the thread ID
if(globalLi>=0 && globalLi<Rd*Cd*Dd)
{
int r=0,c=0,d=0;
GetInd2Sub(globalLi, Rd, Cd, r, c, d); //some calculations to get r,c,d
if(r>=stYd && r<edYd && c>=stXd && c<edXd && d>=stZd && d<edZd)
{
//calculate the histogram
int indexInHis = GetBinNo(devD[globalLi]); //get the bin number in the histogram
atomicAdd(&retHis[indexInHis],1); //when I comment this line the code works
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请参阅 Jason Sanders 和 Edward Kandrot 撰写的CUDA 示例第 9 章。它涵盖了原子并通过一个简单的示例计算 8 位整数的直方图。第一个版本对每个值使用原子添加,这种方法可以工作,但速度非常慢。该示例的精炼版本计算共享内存中每个块的直方图,然后将所有直方图合并到全局内存中以获得最终结果。您的代码就像第一个版本,一旦您开始工作,您将希望使其更像快速完善的版本。
您可以下载书中的示例以查看两个版本: CUDA by示例下载
您似乎没有提供完整的代码或错误消息,因此我无法准确说出您的代码中出了什么问题。这里有一些想法:
Take a look at chapter 9 of CUDA by Example by Jason Sanders and Edward Kandrot. It covers atomics and goes through a simple example computing histograms of 8-bit integers. The first version uses an atomic add for each value, which works but is very slow. The refined version of the example computes a histogram for each block in shared memory, then merges all the histograms together into global memory to get the final result. Your code is like the first version, once you get it working you will want to make it more like the fast refined version.
You can download the examples from the book to see both versions: CUDA by Example downloads
You don't appear to give complete code or error messages, so I can't say exactly what is going wrong in your code. Here are some thoughts:
retHis
, I would add some checks before using the return value, at least for debugging