NVIDIA CUDA 4.0,使用运行时 API 页面锁定内存
NVIDIA CUDA 4.0(此处假设为 RC2)提供了一个很好的功能,即对之前通过“正常”malloc 函数分配的内存范围进行页面锁定。这可以使用驱动程序 API 函数来完成:
CUresult cuMemHostRegister (void * p, size_t bytesize, unsigned int Flags);
现在,到目前为止,项目的开发是使用运行时 API 完成的。不幸的是,运行时 API 似乎不提供像 cuMemHostRegister 这样的函数。我真的很想避免混合驱动程序和运行时 API 调用。
有谁知道如何对先前使用标准 malloc 分配的内存进行页面锁定?不应使用标准 libc 函数,因为执行页面锁定是为了暂存内存以便快速传输到 GPU,所以我真的想坚持使用“CUDA”方式。
坦率
NVIDIA CUDA 4.0 (RC2 is assumed here) offers the nice feature of page-locking a memory range that was allocated before via the "normal" malloc function. This can be done using the driver API function:
CUresult cuMemHostRegister (void * p, size_t bytesize, unsigned int Flags);
Now, the development of the project was done so far using the runtime API. Unfortunately it seems that the runtime API does not offer a function like cuMemHostRegister. I really would like to avoid mixing driver and runtime API calls.
Does anyone know how to page-lock memory that was prior allocated using standard malloc ? Standard libc functions should not be used, since the page-locking is carried out for staging the memory for a fast transfer to the GPU, so I really want to stick to the "CUDA"-way.
Frank
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
4.0 运行时 API 提供了
cudaHostRegister()
,它正是您所要求的。请注意,您锁定的内存分配必须与主机页对齐,因此您可能应该使用mmap()
或posix_memalign()
(或其相关函数之一)来分配记忆。从标准malloc()
向cudaHostRegister()
传递任意大小的分配可能会失败,并出现无效参数错误。The 4.0 runtime API offers
cudaHostRegister()
, which does exactly what you are asking about. Be aware that the memory allocation you lock must be host page aligned, so you probably should use eithermmap()
orposix_memalign()
(or one of its relatives) to allocate the memory. PassingcudaHostRegister()
an allocation of arbitrary size from standardmalloc()
will probably fail with an invalid argument error.