使用 long int 进行原子添加不起作用

发布于 2024-11-10 03:43:11 字数 2021 浏览 1 评论 0原文

正如 cuda 编程指南建议的那样,我想调用 AtomicAdd 函数:

unsigned long long int atomicAdd(unsigned long long int* address,
                             unsigned long long int val);

但是,当使用两个变量调用此函数时:

unsigned long long int *cunsigned long long int sum

我收到此错误:

 dotproduct_kernel.cu(23): error: no instance of overloaded function "atomicAdd" matches the argument list
        argument types are: (unsigned long long *, unsigned long long)

我不知道 long long int 确实存在,所以我尝试了 long int long 但一切都失败了。

我需要一个大数据类型,因为我的结果将接近 10^14。

有关我的设备的所有信息。我猜计算能力是1.2,对吗?

Major revision number:         1
Minor revision number:         2
Name:                          GeForce GT 240
Total global memory:           1073020928
Total shared memory per block: 16384
Total registers per block:     16384
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     512
Maximum dimension 0 of block:  512
Maximum dimension 1 of block:  512
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   65535
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   1
Clock rate:                    1340000
Total constant memory:         65536
Texture alignment:             256
Concurrent copy and execution: Yes
Number of multiprocessors:     12
Kernel execution timeout:      Yes

这是完整的代码:

__global__ void dot (long int *a, long int *b, long int *c){
    __shared__ long int temp[THREADS_PER_BLOCK];
    c[0] = 0;
    long index = (blockIdx.x * blockDim.x) + threadIdx.x;
    temp[threadIdx.x] = a[index] * b[index];

    __syncthreads();

    if( 0 == threadIdx.x ){
        long int sum = 0;
        int i;
        for( i = 0; i<THREADS_PER_BLOCK; i++) {
            sum += temp[i];
        }
        atomicAdd(c, sum); //remember of -arch=sm_11
    }
}

As cuda programming guide suggests, I want to call AtomicAdd function:

unsigned long long int atomicAdd(unsigned long long int* address,
                             unsigned long long int val);

But, when a call this with two variable:

unsigned long long int *c and unsigned long long int sum

I got this error:

 dotproduct_kernel.cu(23): error: no instance of overloaded function "atomicAdd" matches the argument list
        argument types are: (unsigned long long *, unsigned long long)

I didn't know that long long int really exist, so I tried long int long but everything fails.

I need a big Data Type because my result is gonna be something close to 10^14.

All information about my device. I guess the compute capability is 1.2, right?

Major revision number:         1
Minor revision number:         2
Name:                          GeForce GT 240
Total global memory:           1073020928
Total shared memory per block: 16384
Total registers per block:     16384
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     512
Maximum dimension 0 of block:  512
Maximum dimension 1 of block:  512
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   65535
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   1
Clock rate:                    1340000
Total constant memory:         65536
Texture alignment:             256
Concurrent copy and execution: Yes
Number of multiprocessors:     12
Kernel execution timeout:      Yes

This is the complete code:

__global__ void dot (long int *a, long int *b, long int *c){
    __shared__ long int temp[THREADS_PER_BLOCK];
    c[0] = 0;
    long index = (blockIdx.x * blockDim.x) + threadIdx.x;
    temp[threadIdx.x] = a[index] * b[index];

    __syncthreads();

    if( 0 == threadIdx.x ){
        long int sum = 0;
        int i;
        for( i = 0; i<THREADS_PER_BLOCK; i++) {
            sum += temp[i];
        }
        atomicAdd(c, sum); //remember of -arch=sm_11
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小姐丶请自重 2024-11-17 03:43:11

请确保使用 -arch=sm_11 或更高版本编译代码(默认情况下,它是针对计算 camability 1.0 进行编译的)。另请注意,如果您使用 SDK 中包含的 common.mk 文件,因为它可能会覆盖您的某些标志。

抱歉,但我几乎可以肯定atomicAdd的最低要求是1.1,但它似乎是1.2(你的GPU支持)。我还使用“unsigned long long”编译了您的内核(“long int”不是atomicAdd的有效数据类型)。参见 B.11.1.1atomicAdd()。 NVIDIA CUDA C 编程指南,v3.2。

在共享上运行的原子函数
内存和原子函数操作
64 位字
仅适用于
计算能力1.2的设备和
如上所述。

希望这有帮助。

be sure you compile your code with -arch=sm_11 or above (by default it's compiled for compute camability 1.0). Also be aware if you are using the common.mk file include in the SDK as it could override some of your flag.

I'm sorry, but i was almost sure that minimum requirements for atomicAdd was 1.1 but it seems to be 1.2 (which your gpu supports). I've also compiled your kernel using 'unsigned long long' ('long int' is not a valid data type for atomicAdd). See B.11.1.1 atomicAdd(). NVIDIA CUDA C Programming Guide, v3.2.

Atomic functions operating on shared
memory and atomic functions operating
on 64-bit words
are only available for
devices of compute capability 1.2 and
above.

Hope this help.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文