CUDA中常量内存的动态分配
我正在尝试利用恒定内存,但我很难弄清楚如何嵌套数组。 我拥有的是一个数据数组,其中包含内部数据的计数,但每个条目的计数都不同。 因此,基于以下简化代码,我有两个问题。 首先,我不知道如何分配数据结构成员指向的数据。 其次,由于我无法使用 cudaGetSymbolAddress 来获取常量内存,所以我不确定是否可以仅传递全局指针(您无法使用普通的 __device__ 内存来做到这一点)。
struct __align(16)__ data{
int nFiles;
int nNames;
int* files;
int* names;
};
__device__ __constant__ data *mydata;
__host__ void initMemory(...)
{
cudaMalloc( (void **) &(mydata), sizeof(data)*dynamicsize );
for(int i=; i lessthan dynamicsize; i++)
{
cudaMemcpyToSymbol(mydata, &(nFiles[i]), sizeof(int), sizeof(data)*i, cudaMemcpyHostToDevice);
//...
//Problem 1: Allocate & Set mydata[i].files
}
}
__global__ void myKernel(data *constDataPtr)
{
//Problem 2: Access constDataPtr[n].files, etc
}
int main()
{
//...
myKernel grid, threads (mydata);
}
感谢您提供的任何帮助。 :-)
I'm trying to take advantage of the constant memory, but I'm having a hard time figuring out how to nest arrays. What I have is an array of data that has counts for internal data but those are different for each entry. So based around the following simplified code I have two problems. First I don't know how to allocate the data pointed to by the members of my data structure. Second, since I can't use cudaGetSymbolAddress for constant memory I'm not sure if I can just pass the global pointer (which you cannot do with plain __device__ memory).
struct __align(16)__ data{
int nFiles;
int nNames;
int* files;
int* names;
};
__device__ __constant__ data *mydata;
__host__ void initMemory(...)
{
cudaMalloc( (void **) &(mydata), sizeof(data)*dynamicsize );
for(int i=; i lessthan dynamicsize; i++)
{
cudaMemcpyToSymbol(mydata, &(nFiles[i]), sizeof(int), sizeof(data)*i, cudaMemcpyHostToDevice);
//...
//Problem 1: Allocate & Set mydata[i].files
}
}
__global__ void myKernel(data *constDataPtr)
{
//Problem 2: Access constDataPtr[n].files, etc
}
int main()
{
//...
myKernel grid, threads (mydata);
}
Thanks for any help offered. :-)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为常量内存是 64K,您无法使用 cudaMalloc 动态分配它。 它必须被声明为常量,例如,
同样,您也不需要释放它。 另外,您不应该通过指针传递对它的引用,而只需将其作为全局变量访问。 我尝试做类似的事情,但它给了我段错误(在 devicemu 中)。
I think constant memory is 64K and you cannot allocate it dynamically using
cudaMalloc
. It has to be declared constant, say,Similarly you also don't need to free it. Also, you shouldn't pass the reference to it via pointer, just access it as a global variable. I tried doing a similar thing and it gave me segfault (in devicemu).
不,你不能那样做。
常量内存(最大 64KB)只能在编译前进行硬编码。
但是,您可以动态分配纹理内存,该内存也缓存在设备上。
No, you cant do that.
Constant memory (64KB max) can only be hard-coded before compilation.
However you can assign texture memory on the fly which is also cached on the Device.
为什么不直接使用所谓的“打包”数据表示呢? 这种方法允许您将所需的所有数据放入一维字节数组中。 例如,如果您需要存储
您可以将此数据存储在数组中,如下所示:
Why don't you just use the so-called "packed" data representation? This approach allows you to place all the data you need into one-dimension byte array. E.g., if you need to store
You can just store this data in the array this way: