在哪里可以找到有关 CUDA 4.0 中统一虚拟寻址的信息?
在哪里可以找到有关使用 CUDA 4.0 中新增强功能的信息/变更集/建议?我对了解统一虚拟寻址特别感兴趣?
注意:我真的很想看一个例子,我们可以直接从 GPU 访问 RAM。
Where can I find information / changesets / suggestions for using the new enhancements in CUDA 4.0? I'm especially interested in learning about Unified Virtual Addressing?
Note: I would really like to see an example were we can access the RAM directly from the GPU.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,使用主机内存(如果这就是您所说的 RAM)很可能会减慢您的程序速度,因为与 GPU 之间的传输需要一些时间,并且受到 RAM 和 PCI 总线传输速率的限制。尝试将所有内容保留在 GPU 内存中。上传一次,执行内核,下载一次。如果您需要更复杂的东西,请尝试使用带有流的异步内存传输。
据我所知,“统一虚拟寻址”实际上更多的是使用多个设备,从显式内存管理中抽象出来。将其视为单个虚拟 GPU,其他一切仍然有效。
通过设备映射内存,自动使用主机内存已经成为可能。请参阅 nvidia cuda 网站上的参考手册中的 cudaMalloc*。
Yes, using host memory (if that is what you mean by RAM) will most likely slow your program down, because transfers to/from the GPU take some time and are limited by RAM and PCI bus transfer rates. Try to keep everything in GPU memory. Upload once, execute kernel(s), download once. If you need anything more complicated try to use asynchronous memory transfers with streams.
As far as I know "Unified Virtual Addressing" is really more about using multiple devices, abstracting from explicit memory management. Think of it as a single virtual GPU, everything else still valid.
Using host memory automatically is already possible with device-mapped-memory. See cudaMalloc* in the reference manual found at the nvidia cuda website.
CUDA 4.0 UVA(统一虚拟地址)无法帮助您从 CUDA 线程访问主内存。与以前版本的 CUDA 一样,您仍然必须使用 CUDA API 映射主内存,以便从 GPU 线程直接访问,但这会降低性能,如上所述。同样,您不能仅通过取消引用设备内存的指针来从 CPU 线程访问 GPU 设备内存。 UVA仅保证地址空间在多个设备(包括CPU内存)之间不重叠,并且不提供一致的可访问性。
CUDA 4.0 UVA (Unified Virtual Address) does not help you in accessing the main memory from the CUDA threads. As in the previous versions of CUDA, you still have to map the main memory using CUDA API for direct access from GPU threads, but it will slow down the performance as mentioned above. Similarly, you cannot access GPU device memory from CPU thread just by dereferencing the pointer to the device memory. UVA only guarantees that the address spaces do not overlap across multiple devices (including CPU memory), and does not provide coherent accessibility.