CUDA中常量内存的动态分配

发布于 2024-07-09 05:01:40 字数 885 浏览 12 评论 0原文

我正在尝试利用恒定内存，但我很难弄清楚如何嵌套数组。我拥有的是一个数据数组，其中包含内部数据的计数，但每个条目的计数都不同。因此，基于以下简化代码，我有两个问题。首先，我不知道如何分配数据结构成员指向的数据。其次，由于我无法使用 cudaGetSymbolAddress 来获取常量内存，所以我不确定是否可以仅传递全局指针（您无法使用普通的 __device__ 内存来做到这一点）。


struct __align(16)__ data{
int nFiles;
int nNames;
int* files;
int* names;
};

__device__ __constant__ data *mydata;

__host__ void initMemory(...)
{
    cudaMalloc( (void **) &(mydata), sizeof(data)*dynamicsize );
    for(int i=; i lessthan dynamicsize; i++)
    {
        cudaMemcpyToSymbol(mydata, &(nFiles[i]), sizeof(int), sizeof(data)*i, cudaMemcpyHostToDevice);
        //...
        //Problem 1: Allocate & Set mydata[i].files
    }
}

__global__ void myKernel(data *constDataPtr)
{
    //Problem 2: Access constDataPtr[n].files, etc
}

int main()
{
    //...
    myKernel grid, threads (mydata);
}

感谢您提供的任何帮助。 :-)

原文

I'm trying to take advantage of the constant memory, but I'm having a hard time figuring out how to nest arrays. What I have is an array of data that has counts for internal data but those are different for each entry. So based around the following simplified code I have two problems. First I don't know how to allocate the data pointed to by the members of my data structure. Second, since I can't use cudaGetSymbolAddress for constant memory I'm not sure if I can just pass the global pointer (which you cannot do with plain __device__ memory).


struct __align(16)__ data{
int nFiles;
int nNames;
int* files;
int* names;
};

__device__ __constant__ data *mydata;

__host__ void initMemory(...)
{
    cudaMalloc( (void **) &(mydata), sizeof(data)*dynamicsize );
    for(int i=; i lessthan dynamicsize; i++)
    {
        cudaMemcpyToSymbol(mydata, &(nFiles[i]), sizeof(int), sizeof(data)*i, cudaMemcpyHostToDevice);
        //...
        //Problem 1: Allocate & Set mydata[i].files
    }
}

__global__ void myKernel(data *constDataPtr)
{
    //Problem 2: Access constDataPtr[n].files, etc
}

int main()
{
    //...
    myKernel grid, threads (mydata);
}

Thanks for any help offered. :-)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如梦初醒的夏天 2024-07-16 05:01:40

我认为常量内存是 64K，您无法使用 cudaMalloc 动态分配它。它必须被声明为常量，例如，

__constant__ data mydata[100];

同样，您也不需要释放它。另外，您不应该通过指针传递对它的引用，而只需将其作为全局变量访问。我尝试做类似的事情，但它给了我段错误（在 devicemu 中）。

I think constant memory is 64K and you cannot allocate it dynamically using cudaMalloc. It has to be declared constant, say,

__constant__ data mydata[100];

Similarly you also don't need to free it. Also, you shouldn't pass the reference to it via pointer, just access it as a global variable. I tried doing a similar thing and it gave me segfault (in devicemu).

回复收藏 0 原文

拥醉 2024-07-16 05:01:40

不，你不能那样做。

常量内存（最大 64KB）只能在编译前进行硬编码。

但是，您可以动态分配纹理内存，该内存也缓存在设备上。

回复收藏 0 原文

懒猫 2024-07-16 05:01:40

为什么不直接使用所谓的“打包”数据表示呢？这种方法允许您将所需的所有数据放入一维字节数组中。例如，如果您需要存储

struct data
{
    int nFiles;
    int nNames;
    int* files;
    int* names;
}

您可以将此数据存储在数组中，如下所示：

[struct data (7*4=28 bytes)
    [int nFiles=3 (4 bytes)]
    [int nNames=2 (4 bytes)]
    [file0 (4 bytes)]
    [file1 (4 bytes)]
    [file2 (4 bytes)]
    [name0 (4 bytes)]
    [name1 (4 bytes)]
]

Why don't you just use the so-called "packed" data representation? This approach allows you to place all the data you need into one-dimension byte array. E.g., if you need to store

struct data
{
    int nFiles;
    int nNames;
    int* files;
    int* names;
}

You can just store this data in the array this way:

[struct data (7*4=28 bytes)
    [int nFiles=3 (4 bytes)]
    [int nNames=2 (4 bytes)]
    [file0 (4 bytes)]
    [file1 (4 bytes)]
    [file2 (4 bytes)]
    [name0 (4 bytes)]
    [name1 (4 bytes)]
]

回复收藏 0 原文

~没有更多了~