CUDA 内核操作中原子添加的一些问题

发布于 2024-11-07 18:03:42 字数 699 浏览 2 评论 0原文

我的 kernel.cu 类有问题

，调用 nvcc -v kernel.cu -o kernel.o 我收到此错误：

kernel.cu(17): error: identifier "atomicAdd" is undefined

我的代码：

#include "dot.h"
#include <cuda.h>
#include "device_functions.h" //might call atomicAdd

__global__ void dot (int *a, int *b, int *c){
    __shared__ int temp[THREADS_PER_BLOCK];
    int index = threadIdx.x + blockIdx.x * blockDim.x;
    temp[threadIdx.x] = a[index] * b[index];

    __syncthreads();

    if( 0 == threadIdx.x ){
        int sum = 0;
        for( int i = 0; i<THREADS_PER_BLOCK; i++)
            sum += temp[i];
        atomicAdd(c, sum);
    }
}

有人建议吗？

原文

I'm having a issue with my kernel.cu class

Calling nvcc -v kernel.cu -o kernel.o I'm getting this error:

kernel.cu(17): error: identifier "atomicAdd" is undefined

My code:

#include "dot.h"
#include <cuda.h>
#include "device_functions.h" //might call atomicAdd

__global__ void dot (int *a, int *b, int *c){
    __shared__ int temp[THREADS_PER_BLOCK];
    int index = threadIdx.x + blockIdx.x * blockDim.x;
    temp[threadIdx.x] = a[index] * b[index];

    __syncthreads();

    if( 0 == threadIdx.x ){
        int sum = 0;
        for( int i = 0; i<THREADS_PER_BLOCK; i++)
            sum += temp[i];
        atomicAdd(c, sum);
    }
}

Some suggest?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谁人与我共长歌 2024-11-14 18:03:42

您需要为nvcc指定一个支持原子内存操作的架构（默认架构是1.0，不支持原子操作）。尝试：

nvcc -arch=sm_11 -v kernel.cu -o kernel.o

看看会发生什么。

2015 年编辑，请注意 CUDA 7.0 中的默认架构现在是 2.0，它支持原子内存操作，因此这在较新的工具包版本中不应该成为问题。

You need to specify an architecture to nvcc which supports atomic memory operations (the default architecture is 1.0 which does not support atomics). Try:

nvcc -arch=sm_11 -v kernel.cu -o kernel.o

and see what happens.

EDIT in 2015 to note that the default architecture in CUDA 7.0 is now 2.0, which supports atomic memory operations, so this should not be a problem in newer toolkit versions.

回复收藏 0 原文

转身以后 2024-11-14 18:03:42

如今，使用最新的 cuda SDK 和工具包，该解决方案将不再适用。
人们还说，

compute_11,sm_11; OR compute_12,sm_12; OR compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;

在 Visual Studio 2010 的项目属性中添加：到 CUDA 就可以了。事实并非如此。

您必须在 .cu 文件本身的属性中（在 C++/CUDA->Device->Code Generation 下）选项卡中指定它，例如：

compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;

Today with the latest cuda SDK and toolkit this solution will not work.
People also say that adding:

compute_11,sm_11; OR compute_12,sm_12; OR compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;

to CUDA in the Project Properties in Visual Studio 2010 will work. It doesn't.

You have to specify this for the .cu file itself in its own properties (Under the C++/CUDA->Device->Code Generation) tab such as:

compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;

回复收藏 0 原文

~没有更多了~

关于作者

旧话新听

暂无简介

0 文章

0 评论

25 人气

关注发私信

友情链接

文江博客

CUDA 内核操作中原子添加的一些问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

CUDA 内核操作中原子添加的一些问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

Gabu-gabumon

qq_CgiN62

荔枝明

赏烟花じ飞满天

独守阴晴ぅ圆缺

¤→小豸慧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。