当前位置：文江博客话题详情

cuda c++ c deep-copy

如何使用 CUDA 执行结构的深度复制？

发布于 2024-11-18 18:36:25 字数 779 浏览 7 评论 0原文

使用 CUDA 编程我在尝试将一些数据从主机复制到 GPU 时遇到问题。

我有 3 个这样的嵌套结构：

typedef struct {
    char data[128];
    short length;
} Cell;

typedef struct {
    Cell* elements;
    int height;
    int width;
} Matrix;

typedef struct {
    Matrix* tables;
    int count;
} Container;

所以 Container “包含”一些 Matrix 元素，而这些元素又包含一些 Cell 元素。

假设我以这种方式动态分配主机内存：

Container c;
c.tables = malloc(20 * sizeof(Matrix));

for(int i = 0;i<20;i++){
    Matrix m;
    m.elements = malloc(100 * sizeof(Cell));
    c.tables[i] = m;
}

即一个包含 20 个矩阵（每个矩阵有 100 个单元）的容器。

我现在如何使用 cudaMemCpy() 将此数据复制到设备内存？
有没有什么好方法来执行从主机到设备的“结构的结构”的深层复制？

感谢您抽出时间。

安德里亚

Programming with CUDA I am facing a problem trying to copy some data from host to gpu.

I have 3 nested struct like these:

typedef struct {
    char data[128];
    short length;
} Cell;

typedef struct {
    Cell* elements;
    int height;
    int width;
} Matrix;

typedef struct {
    Matrix* tables;
    int count;
} Container;

So Container "includes" some Matrix elements, which in turn includes some Cell elements.

Let's suppose I dynamically allocate the host memory in this way:

Container c;
c.tables = malloc(20 * sizeof(Matrix));

for(int i = 0;i<20;i++){
    Matrix m;
    m.elements = malloc(100 * sizeof(Cell));
    c.tables[i] = m;
}

That is, a Container of 20 Matrix of 100 Cells each.

How could I now copy this data to the device memory using cudaMemCpy()?
Is there any good way to perform a deep copy of "struct of struct" from host to device?

Thanks for your time.

Andrea

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

生生不灭 2024-11-25 18:36:25

简短的回答是“只是不要”。我这么说有四个原因：

API 中没有深度复制功能
您必须编写的结果代码才能设置并向 GPU 复制您所描述的结构，这将极其复杂（至少大约 4000 个 API 调用），并且可能是 20 个 100 个单元矩阵示例的中间内核）
使用三级指针间接寻址的 GPU 代码将大大增加内存访问延迟，并且会破坏 GPU 上可用的少量缓存一致性（
如果您想复制数据）之后回到主机，你反过来也有同样的问题

考虑使用线性内存和索引代替。它可以在主机和 GPU 之间移植，并且分配和复制开销约为基于指针的替代方案的 1%。

如果您真的想要这样做，请发表评论，我将尝试挖掘一些旧的代码示例，这些示例展示了 GPU 上的完全愚蠢的嵌套指针。

回复收藏 0 原文

~没有更多了~

关于作者

天荒地未老

暂无简介

文章

评论

26 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

十二

文章 0 评论 0

飞烟轻若梦

文章 0 评论 0

OPleyuhuo

文章 0 评论 0

wxb0109

文章 0 评论 0

旧城空念

文章 0 评论 0

-小熊_

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文