什么是真正的 C++ CUDA 设备代码支持的语言结构？

发布于 2024-10-15 19:11:11 字数 636 浏览 8 评论 0原文

CUDA 文档 3.2 版本的附录 D 提到了 CUDA 设备代码中的 C++ 支持。
明确提到CUDA支持“计算能力2.x设备的类”。但是，我正在使用计算能力 1.1 和 1.3 的设备，我可以使用此功能！

例如，这段代码可以工作：

// class definition voluntary simplified
class Foo {
  private:
    int x_;

  public:
    __device__ Foo() { x_ = 42; }
    __device__ void bar() { return x_; }
};


//kernel using the previous class
__global__ void testKernel(uint32_t* ddata) {
    Foo f;
    ddata[threadIdx.x] = f.bar(); 
}

我还可以使用广泛的库，例如 Thrust::random 随机生成类。我唯一的猜测是，由于 __device__ 标记函数的自动内联，我能够做到这一点，但这并不能解释成员变量的处理。

您是否曾经在相同条件下使用过此类功能，或者您能否向我解释一下为什么我的 CUDA 代码会这样？参考指南有问题吗？

原文

Appendix D of the 3.2 version of the CUDA documentation refers to C++ support in CUDA device code.
It is clearly mentioned that CUDA supports "Classes for devices of compute capability 2.x". However, I'm working with devices of compute capability 1.1 and 1.3 and I can use this feature!

For instance, this code works:

// class definition voluntary simplified
class Foo {
  private:
    int x_;

  public:
    __device__ Foo() { x_ = 42; }
    __device__ void bar() { return x_; }
};


//kernel using the previous class
__global__ void testKernel(uint32_t* ddata) {
    Foo f;
    ddata[threadIdx.x] = f.bar(); 
}

I'm also able to use widespread libraries such as Thrust::random random generation classes.
My only guess is that I'm able to do so thanks to the automatic inlining of __device__ marked function, but this does not explain the handling of member variables withal.

Have you ever used such features in the same conditions, or can you explain to me why my CUDA code behaves this way? Is there something wrong in the reference guide?

分享到QQ

分享到微博