CUDA 内核操作中原子添加的一些问题
我的 kernel.cu 类有问题
,调用 nvcc -v kernel.cu -o kernel.o
我收到此错误:
kernel.cu(17): error: identifier "atomicAdd" is undefined
我的代码:
#include "dot.h"
#include <cuda.h>
#include "device_functions.h" //might call atomicAdd
__global__ void dot (int *a, int *b, int *c){
__shared__ int temp[THREADS_PER_BLOCK];
int index = threadIdx.x + blockIdx.x * blockDim.x;
temp[threadIdx.x] = a[index] * b[index];
__syncthreads();
if( 0 == threadIdx.x ){
int sum = 0;
for( int i = 0; i<THREADS_PER_BLOCK; i++)
sum += temp[i];
atomicAdd(c, sum);
}
}
有人建议吗?
I'm having a issue with my kernel.cu class
Calling nvcc -v kernel.cu -o kernel.o
I'm getting this error:
kernel.cu(17): error: identifier "atomicAdd" is undefined
My code:
#include "dot.h"
#include <cuda.h>
#include "device_functions.h" //might call atomicAdd
__global__ void dot (int *a, int *b, int *c){
__shared__ int temp[THREADS_PER_BLOCK];
int index = threadIdx.x + blockIdx.x * blockDim.x;
temp[threadIdx.x] = a[index] * b[index];
__syncthreads();
if( 0 == threadIdx.x ){
int sum = 0;
for( int i = 0; i<THREADS_PER_BLOCK; i++)
sum += temp[i];
atomicAdd(c, sum);
}
}
Some suggest?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您需要为
nvcc
指定一个支持原子内存操作的架构(默认架构是1.0,不支持原子操作)。尝试:看看会发生什么。
2015 年编辑,请注意 CUDA 7.0 中的默认架构现在是 2.0,它支持原子内存操作,因此这在较新的工具包版本中不应该成为问题。
You need to specify an architecture to
nvcc
which supports atomic memory operations (the default architecture is 1.0 which does not support atomics). Try:and see what happens.
EDIT in 2015 to note that the default architecture in CUDA 7.0 is now 2.0, which supports atomic memory operations, so this should not be a problem in newer toolkit versions.
如今,使用最新的 cuda SDK 和工具包,该解决方案将不再适用。
人们还说,
在 Visual Studio 2010 的项目属性中添加:到 CUDA 就可以了。事实并非如此。
您必须在 .cu 文件本身的属性中(在 C++/CUDA->Device->Code Generation 下)选项卡中指定它,例如:
Today with the latest cuda SDK and toolkit this solution will not work.
People also say that adding:
to CUDA in the Project Properties in Visual Studio 2010 will work. It doesn't.
You have to specify this for the .cu file itself in its own properties (Under the C++/CUDA->Device->Code Generation) tab such as: