NVIDIA CUDA 4.0，使用运行时 API 页面锁定内存

发布于 2024-11-03 18:28:21 字数 420 浏览 0 评论 0原文

NVIDIA CUDA 4.0（此处假设为 RC2）提供了一个很好的功能，即对之前通过“正常”malloc 函数分配的内存范围进行页面锁定。这可以使用驱动程序 API 函数来完成：

CUresult cuMemHostRegister (void * p, size_t bytesize, unsigned int Flags);

现在，到目前为止，项目的开发是使用运行时 API 完成的。不幸的是，运行时 API 似乎不提供像 cuMemHostRegister 这样的函数。我真的很想避免混合驱动程序和运行时 API 调用。

有谁知道如何对先前使用标准 malloc 分配的内存进行页面锁定？不应使用标准 libc 函数，因为执行页面锁定是为了暂存内存以便快速传输到 GPU，所以我真的想坚持使用“CUDA”方式。

坦率

原文

NVIDIA CUDA 4.0 (RC2 is assumed here) offers the nice feature of page-locking a memory range that was allocated before via the "normal" malloc function. This can be done using the driver API function:

CUresult cuMemHostRegister (void * p, size_t bytesize, unsigned int Flags);

Now, the development of the project was done so far using the runtime API. Unfortunately it seems that the runtime API does not offer a function like cuMemHostRegister. I really would like to avoid mixing driver and runtime API calls.

Does anyone know how to page-lock memory that was prior allocated using standard malloc ? Standard libc functions should not be used, since the page-locking is carried out for staging the memory for a fast transfer to the GPU, so I really want to stick to the "CUDA"-way.

Frank

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寒尘 2024-11-10 18:28:21

4.0 运行时 API 提供了 cudaHostRegister()，它正是您所要求的。请注意，您锁定的内存分配必须与主机页对齐，因此您可能应该使用 mmap() 或 posix_memalign() （或其相关函数之一）来分配记忆。从标准 malloc() 向 cudaHostRegister() 传递任意大小的分配可能会失败，并出现无效参数错误。