使用 long int 进行原子添加不起作用
正如 cuda 编程指南建议的那样,我想调用 AtomicAdd 函数:
unsigned long long int atomicAdd(unsigned long long int* address,
unsigned long long int val);
但是,当使用两个变量调用此函数时:
unsigned long long int *c
和 unsigned long long int sum
我收到此错误:
dotproduct_kernel.cu(23): error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (unsigned long long *, unsigned long long)
我不知道 long long int
确实存在,所以我尝试了 long int
long
但一切都失败了。
我需要一个大数据类型,因为我的结果将接近 10^14。
有关我的设备的所有信息。我猜计算能力是1.2,对吗?
Major revision number: 1
Minor revision number: 2
Name: GeForce GT 240
Total global memory: 1073020928
Total shared memory per block: 16384
Total registers per block: 16384
Warp size: 32
Maximum memory pitch: 2147483647
Maximum threads per block: 512
Maximum dimension 0 of block: 512
Maximum dimension 1 of block: 512
Maximum dimension 2 of block: 64
Maximum dimension 0 of grid: 65535
Maximum dimension 1 of grid: 65535
Maximum dimension 2 of grid: 1
Clock rate: 1340000
Total constant memory: 65536
Texture alignment: 256
Concurrent copy and execution: Yes
Number of multiprocessors: 12
Kernel execution timeout: Yes
这是完整的代码:
__global__ void dot (long int *a, long int *b, long int *c){
__shared__ long int temp[THREADS_PER_BLOCK];
c[0] = 0;
long index = (blockIdx.x * blockDim.x) + threadIdx.x;
temp[threadIdx.x] = a[index] * b[index];
__syncthreads();
if( 0 == threadIdx.x ){
long int sum = 0;
int i;
for( i = 0; i<THREADS_PER_BLOCK; i++) {
sum += temp[i];
}
atomicAdd(c, sum); //remember of -arch=sm_11
}
}
As cuda programming guide suggests, I want to call AtomicAdd function:
unsigned long long int atomicAdd(unsigned long long int* address,
unsigned long long int val);
But, when a call this with two variable:
unsigned long long int *c
and unsigned long long int sum
I got this error:
dotproduct_kernel.cu(23): error: no instance of overloaded function "atomicAdd" matches the argument list
argument types are: (unsigned long long *, unsigned long long)
I didn't know that long long int
really exist, so I tried long int
long
but everything fails.
I need a big Data Type because my result is gonna be something close to 10^14.
All information about my device. I guess the compute capability is 1.2, right?
Major revision number: 1
Minor revision number: 2
Name: GeForce GT 240
Total global memory: 1073020928
Total shared memory per block: 16384
Total registers per block: 16384
Warp size: 32
Maximum memory pitch: 2147483647
Maximum threads per block: 512
Maximum dimension 0 of block: 512
Maximum dimension 1 of block: 512
Maximum dimension 2 of block: 64
Maximum dimension 0 of grid: 65535
Maximum dimension 1 of grid: 65535
Maximum dimension 2 of grid: 1
Clock rate: 1340000
Total constant memory: 65536
Texture alignment: 256
Concurrent copy and execution: Yes
Number of multiprocessors: 12
Kernel execution timeout: Yes
This is the complete code:
__global__ void dot (long int *a, long int *b, long int *c){
__shared__ long int temp[THREADS_PER_BLOCK];
c[0] = 0;
long index = (blockIdx.x * blockDim.x) + threadIdx.x;
temp[threadIdx.x] = a[index] * b[index];
__syncthreads();
if( 0 == threadIdx.x ){
long int sum = 0;
int i;
for( i = 0; i<THREADS_PER_BLOCK; i++) {
sum += temp[i];
}
atomicAdd(c, sum); //remember of -arch=sm_11
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请确保使用 -arch=sm_11 或更高版本编译代码(默认情况下,它是针对计算 camability 1.0 进行编译的)。另请注意,如果您使用 SDK 中包含的 common.mk 文件,因为它可能会覆盖您的某些标志。
抱歉,但我几乎可以肯定atomicAdd的最低要求是1.1,但它似乎是1.2(你的GPU支持)。我还使用“unsigned long long”编译了您的内核(“long int”不是atomicAdd的有效数据类型)。参见 B.11.1.1atomicAdd()。 NVIDIA CUDA C 编程指南,v3.2。
希望这有帮助。
be sure you compile your code with -arch=sm_11 or above (by default it's compiled for compute camability 1.0). Also be aware if you are using the common.mk file include in the SDK as it could override some of your flag.
I'm sorry, but i was almost sure that minimum requirements for atomicAdd was 1.1 but it seems to be 1.2 (which your gpu supports). I've also compiled your kernel using 'unsigned long long' ('long int' is not a valid data type for atomicAdd). See B.11.1.1 atomicAdd(). NVIDIA CUDA C Programming Guide, v3.2.
Hope this help.