使用 long int 进行原子添加不起作用

发布于 2024-11-10 03:43:11 字数 2021 浏览 1 评论 0原文

正如 cuda 编程指南建议的那样，我想调用 AtomicAdd 函数：

unsigned long long int atomicAdd(unsigned long long int* address,
                             unsigned long long int val);

但是，当使用两个变量调用此函数时：

unsigned long long int *c 和 unsigned long long int sum

我收到此错误：

 dotproduct_kernel.cu(23): error: no instance of overloaded function "atomicAdd" matches the argument list
        argument types are: (unsigned long long *, unsigned long long)

我不知道 long long int 确实存在，所以我尝试了 long int long 但一切都失败了。

我需要一个大数据类型，因为我的结果将接近 10^14。

有关我的设备的所有信息。我猜计算能力是1.2，对吗？

Major revision number:         1
Minor revision number:         2
Name:                          GeForce GT 240
Total global memory:           1073020928
Total shared memory per block: 16384
Total registers per block:     16384
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     512
Maximum dimension 0 of block:  512
Maximum dimension 1 of block:  512
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   65535
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   1
Clock rate:                    1340000
Total constant memory:         65536
Texture alignment:             256
Concurrent copy and execution: Yes
Number of multiprocessors:     12
Kernel execution timeout:      Yes

这是完整的代码：

__global__ void dot (long int *a, long int *b, long int *c){
    __shared__ long int temp[THREADS_PER_BLOCK];
    c[0] = 0;
    long index = (blockIdx.x * blockDim.x) + threadIdx.x;
    temp[threadIdx.x] = a[index] * b[index];

    __syncthreads();

    if( 0 == threadIdx.x ){
        long int sum = 0;
        int i;
        for( i = 0; i<THREADS_PER_BLOCK; i++) {
            sum += temp[i];
        }
        atomicAdd(c, sum); //remember of -arch=sm_11
    }
}

原文

As cuda programming guide suggests, I want to call AtomicAdd function:

unsigned long long int atomicAdd(unsigned long long int* address,
                             unsigned long long int val);

But, when a call this with two variable:

unsigned long long int *c and unsigned long long int sum

I got this error:

 dotproduct_kernel.cu(23): error: no instance of overloaded function "atomicAdd" matches the argument list
        argument types are: (unsigned long long *, unsigned long long)

I didn't know that long long int really exist, so I tried long int long but everything fails.

I need a big Data Type because my result is gonna be something close to 10^14.

All information about my device. I guess the compute capability is 1.2, right?

Major revision number:         1
Minor revision number:         2
Name:                          GeForce GT 240
Total global memory:           1073020928
Total shared memory per block: 16384
Total registers per block:     16384
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     512
Maximum dimension 0 of block:  512
Maximum dimension 1 of block:  512
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   65535
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   1
Clock rate:                    1340000
Total constant memory:         65536
Texture alignment:             256
Concurrent copy and execution: Yes
Number of multiprocessors:     12
Kernel execution timeout:      Yes

This is the complete code:

__global__ void dot (long int *a, long int *b, long int *c){
    __shared__ long int temp[THREADS_PER_BLOCK];
    c[0] = 0;
    long index = (blockIdx.x * blockDim.x) + threadIdx.x;
    temp[threadIdx.x] = a[index] * b[index];

    __syncthreads();

    if( 0 == threadIdx.x ){
        long int sum = 0;
        int i;
        for( i = 0; i<THREADS_PER_BLOCK; i++) {
            sum += temp[i];
        }
        atomicAdd(c, sum); //remember of -arch=sm_11
    }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小姐丶请自重 2024-11-17 03:43:11

请确保使用 -arch=sm_11 或更高版本编译代码（默认情况下，它是针对计算 camability 1.0 进行编译的）。另请注意，如果您使用 SDK 中包含的 common.mk 文件，因为它可能会覆盖您的某些标志。

抱歉，但我几乎可以肯定atomicAdd的最低要求是1.1，但它似乎是1.2（你的GPU支持）。我还使用“unsigned long long”编译了您的内核（“long int”不是atomicAdd的有效数据类型）。参见 B.11.1.1atomicAdd()。 NVIDIA CUDA C 编程指南，v3.2。