如何使用 CUDA 执行结构的深度复制?
使用 CUDA 编程我在尝试将一些数据从主机复制到 GPU 时遇到问题。
我有 3 个这样的嵌套结构:
typedef struct {
char data[128];
short length;
} Cell;
typedef struct {
Cell* elements;
int height;
int width;
} Matrix;
typedef struct {
Matrix* tables;
int count;
} Container;
所以 Container
“包含”一些 Matrix
元素,而这些元素又包含一些 Cell
元素。
假设我以这种方式动态分配主机内存:
Container c;
c.tables = malloc(20 * sizeof(Matrix));
for(int i = 0;i<20;i++){
Matrix m;
m.elements = malloc(100 * sizeof(Cell));
c.tables[i] = m;
}
即一个包含 20 个矩阵(每个矩阵有 100 个单元)的容器。
- 我现在如何使用 cudaMemCpy() 将此数据复制到设备内存?
- 有没有什么好方法来执行从主机到设备的“结构的结构”的深层复制?
感谢您抽出时间。
安德里亚
Programming with CUDA I am facing a problem trying to copy some data from host to gpu.
I have 3 nested struct like these:
typedef struct {
char data[128];
short length;
} Cell;
typedef struct {
Cell* elements;
int height;
int width;
} Matrix;
typedef struct {
Matrix* tables;
int count;
} Container;
So Container
"includes" some Matrix
elements, which in turn includes some Cell
elements.
Let's suppose I dynamically allocate the host memory in this way:
Container c;
c.tables = malloc(20 * sizeof(Matrix));
for(int i = 0;i<20;i++){
Matrix m;
m.elements = malloc(100 * sizeof(Cell));
c.tables[i] = m;
}
That is, a Container of 20 Matrix of 100 Cells each.
- How could I now copy this data to the device memory using cudaMemCpy()?
- Is there any good way to perform a deep copy of "struct of struct" from host to device?
Thanks for your time.
Andrea
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
简短的回答是“只是不要”。我这么说有四个原因:
考虑使用线性内存和索引代替。它可以在主机和 GPU 之间移植,并且分配和复制开销约为基于指针的替代方案的 1%。
如果您真的想要这样做,请发表评论,我将尝试挖掘一些旧的代码示例,这些示例展示了 GPU 上的完全愚蠢的嵌套指针。
The short answer is "just don't". There are four reasons why I say that:
Consider using linear memory and indexing instead. It is portable between host and GPU, and the allocation and copy overhead is about 1% of the pointer based alternative.
If you really want to do this, leave a comment and I will try and dig up some old code examples which show what a complete folly nested pointers are on the GPU.