CUDA:在多个设备之间共享数据?

发布于 2024-10-01 07:44:28 字数 237 浏览 3 评论 0原文

在《CUDA C 编程指南》中,据说

...根据设计,主机线程在任何给定时间只能在一个设备上执行设备代码。因此,需要多个主机线程在多个设备上执行设备代码。此外,通过一个主机线程中的运行时创建的任何 CUDA 资源都不能被另一主机线程的运行时使用...

我想做的是让两个 GPU 共享主机上的数据(映射内存),
但手册似乎说这是不可能的。
有什么解决办法吗

in CUDA C Programming Guide, it is said that

... by design, a host thread can execute device code on only one device at any given time. As a consequence, multiple host threads are required to execute device code on multiple devices. Also, any CUDA resources created through the runtime in one host thread cannot be used by the runtime from another host thread...

What I wanted to do is make two GPUs share data on host(mapped memory),
but the manual is seemed to say that it is not possible.
Is there any solution for this

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

瑶笙 2024-10-08 07:44:28

分配主机内存时,应使用 cudaHostAlloc() 进行分配并传递 cudaHostAllocPortable 标志。这将允许多个 CUDA 上下文访问内存。

When you are allocating the host memory, you should allocate using cudaHostAlloc() and pass the cudaHostAllocPortable flag. This will allow the memory to be accessed by multiple CUDA contexts.

锦欢 2024-10-08 07:44:28

解决方案是手动管理这些公共数据。即使使用 SLI。

卡在 SLI 模式下并不真正具有共享内存 - 共享数据必须通过总线从一个卡复制到另一个卡。

http://forums.nvidia.com/index.php?showtopic=30740

Solution is to manually manage these common data. Even with SLI.

Cards do not really have shared memory in SLI mode - shared data must be copied from one to the other via the bus.

http://forums.nvidia.com/index.php?showtopic=30740

不必了 2024-10-08 07:44:28

你可能想看看GMAC。它是一个构建在 CUDA 之上的库,给人一种共享内存的错觉。它实际上所做的就是在主机和GPU设备上的同一虚拟地址上分配内存,并使用页面保护按需传输数据。请注意,它还处于实验阶段,可能处于 Beta 测试阶段。

http://code.google.com/p/adsm/

You may want to look at GMAC. It's a library built on top of CUDA that gives the illusion of shared memory. What it actually does is to allocate memory at the same virtual address on the host and GPU devices, and use page protection to transfer data on demand. Be aware that it is somewhat experimental, maybe in the beta testing stage.

http://code.google.com/p/adsm/

辞别 2024-10-08 07:44:28

您希望通过将 cudaHostAllocPortable 传递给 cudaHostAlloc() 将固定内存分配为可移植。当然,您可以在内核外部的同一固定内存中的设备之间交换数据,正如我之前所做的那样。至于映射内存,我不太确定,但我不明白为什么你不能这样做。尝试使用 cudaHostGetDevicePointer() 获取用于当前设备的设备指针(您已与同一 CPU 线程关联)。

更多信息请参见 CUDA 编程指南的 3.2.5.3 节( v3.2):

页锁定主机内存块可以分配为映射和可移植(请参阅第 3.2.5.1 节),在这种情况下,需要将该块映射到其设备地址空间的每个主机线程必须调用 cudaHostGetDevicePointer() 来检索设备指针,因为一个主机线程的设备指针通常与另一个主机线程不同。

You want to allocate your pinned memory as portable by passing cudaHostAllocPortable to cudaHostAlloc(). You can exchange data outside the kernel between devices from the same pinned memory for sure, as I've done this before. As for mapped memory, I'm not quite as sure but I don't see why you wouldn't be able to. Try using cudaHostGetDevicePointer() to get the device pointer to use for the current device (that you've associated with the same CPU thread.)

There's more info in section 3.2.5.3 of the CUDA Programming Guide (v3.2):

A block of page-locked host memory can be allocated as both mapped and portable (see Section 3.2.5.1), in which case each host thread that needs to map the block to its device address space must call cudaHostGetDevicePointer() to retrieve a device pointer, as device pointers will generally differ from one host thread to the other.

杀お生予夺 2024-10-08 07:44:28

我在 NVIDIA 论坛上专门提出了一个关于如何在两个 GPU 之间传输数据的类似问题,并收到了回复说,如果你想同时使用两个 GPU 并在它们之间传输数据,你必须有两个线程(正如手册所建议的那样) 。该手册说“CUDA资源”不能共享,但是可以共享它们复制的主机内存(使用openmp或mpi)。因此,如果您将内存从每个设备传输回主机,您就可以在设备之间访问内存。

请记住,这将非常慢,因为与设备之间的内存传输将非常慢。

所以不,你不能从gpu2访问gpu1内存(即使使用sli - 我因为与cuda完全无关而被大喊大叫)。但是,您可以使用 gpu1,写入主机上的一个区域,然后使用 gpu2 写入另一个区域,并允许管理每个设备的线程将必要的数据写回到正确的 GPU。

I have specifically asked a similar question on the NVIDIA forums regarding how to transfer data between two gpus and have receieved responses saying that if you want to use two gpus simultaneously and transfer data between them, you must have two threads (as the manual suggests). The manual says that "CUDA resources" cannot be shared, however the host memory they are copied from can be shared (using openmp or mpi). Thus if you transfer your memory back to the host from each device, you could access memory between devices.

Keep in mind that this will be very slow, as the transfer of memory to/from devices will be very slow.

So no you can't access gpu1 memory from gpu2 (even with sli - which I have been yelled at for not being related at all to cuda). however you can take gpu1, write to a region on the host, and then take gpu2 and write to another region, and allow the threads managing each device to write the necessary data back to correct gpu.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文