CUDA在设备上静态分配数据

发布于 2024-11-07 00:30:21 字数 885 浏览 3 评论 0原文

我一直在尝试分配一个可由每个内核函数访问的变量。我的尝试是下面附加的代码，但它不会编译，因为内核无法查看和访问 dArray。在 C++ 中，您可以将变量放在顶部或声明 static 以在整个程序的每个范围内访问。

__global__ void StoreThreadNumber()
{
    dArray[threadIdx.x] = threadIdx.x;
}

int main( int argc, char** argv)
{
    unsigned __int8 Array[16] = { 0 };
    unsigned __int8 dArray[16];

    for( __int8 Position = 0; Position < 16; Position++)
        cout << Array[Position] << " ";
    cout << endl;

    cudaMalloc((void**) dArray, 16*sizeof(__int8));
    cudaMemcpy( dArray, Array, 16*sizeof(__int8), cudaMemcpyHostToDevice);

    StoreThreadNumber<<<1, 16>>>();

    cudaMemcpy( Array, dArray, 16*sizeof(__int8), cudaMemcpyDeviceToHost);

    for( __int8 Position = 0; Position < 16; Position++)
        cout << Array[Position] << " ";
    cout << endl;

    cudaFree(dArray);
}

原文

I've been trying to allocate a variable that can be accessed by each kernel function.
My attempt is the code attached below, but it won't compile cause the dArray can't be viewed accessed by the kernel. In C++ you would place the variable at the top or declare static to be accessed in every scope through out the program.

__global__ void StoreThreadNumber()
{
    dArray[threadIdx.x] = threadIdx.x;
}

int main( int argc, char** argv)
{
    unsigned __int8 Array[16] = { 0 };
    unsigned __int8 dArray[16];

    for( __int8 Position = 0; Position < 16; Position++)
        cout << Array[Position] << " ";
    cout << endl;

    cudaMalloc((void**) dArray, 16*sizeof(__int8));
    cudaMemcpy( dArray, Array, 16*sizeof(__int8), cudaMemcpyHostToDevice);

    StoreThreadNumber<<<1, 16>>>();

    cudaMemcpy( Array, dArray, 16*sizeof(__int8), cudaMemcpyDeviceToHost);

    for( __int8 Position = 0; Position < 16; Position++)
        cout << Array[Position] << " ";
    cout << endl;

    cudaFree(dArray);
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

又爬满兰若 2024-11-14 00:30:21

您可以在 CUDA 中使用 __device__ 或 __constant__ 类型的全局变量。因此，例如，如果您使用 cudaMemcpyToSymbol() 将 __constant__ 指针变量初始化为设备指针的地址，则可以通过 __constant__ 访问该指针 变量：

__constant__ int* dArrayPtr;

__global__ void StoreThreadNumber()
{
    dArrayPtr[threadIdx.x] = threadIdx.x;
}

只需确保在运行内核之前从主机代码正确初始化 dArrayPtr 即可。

You can have global variables in CUDA, of type __device__ or __constant__. So, for example, if you initialize a __constant__ pointer variable to the address of a device pointer using cudaMemcpyToSymbol(), you can then access that pointer via the __constant__ variable:

__constant__ int* dArrayPtr;

__global__ void StoreThreadNumber()
{
    dArrayPtr[threadIdx.x] = threadIdx.x;
}

Just make sure you correctly initialize dArrayPtr from your host code before you run the kernel.

回复收藏 0 原文

擦肩而过的背影 2024-11-14 00:30:21

你不能。您必须将 dArray 的指针传递给内核。

我遇到了同样的问题，必须将大量全局数据传递到 GPU。我最终将其全部包装在一个结构中，并传递一个指向它的指针。

回复收藏 0 原文

~没有更多了~

关于作者

晚雾

暂无简介

0 文章

0 评论

23 人气

关注发私信

友情链接

文江博客

CUDA在设备上静态分配数据

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

CUDA在设备上静态分配数据

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

胡图图

zt006

z祗昰~

冰葑

野の

天空

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。