cudaMemcpy 参数无效
我的程序运行 2 个线程 - 线程 A(用于输入)和 B(用于处理)。我还有一对指向2个缓冲区的指针,这样当线程A完成将数据复制到缓冲区1时,线程B开始处理缓冲区1,线程A开始将数据复制到缓冲区2。然后当缓冲区2满时,线程A复制数据进入缓冲区 1,线程 B 处理缓冲区 2,依此类推。
当我尝试将 cudaMemcpy Buffer[] 放入 d_Buffer 时,我的问题就出现了(之前是由主线程 cudaMalloc 的,即在线程创建之前。Buffer[] 也由主线程 malloc 的)。我收到“无效参数”错误,但不知道哪个是无效参数。
我已将程序简化为单线程程序,但仍使用 2 个缓冲区。也就是说,复制和处理是相继发生的,而不是同时发生的。 cudaMemcpy 行与双线程行完全相同。单线程程序运行良好。
我不确定错误出在哪里。
谢谢。
问候, 雷恩
My program runs 2 threads - Thread A (for input) and B (for processing). I also have a pair of pointers to 2 buffers, so that when Thread A has finished copying data into Buffer 1, Thread B starts processing Buffer 1 and Thread A starts copying data into Buffer 2. Then when Buffer 2 is full, Thread A copies data into Buffer 1 and Thread B processes Buffer 2, and so on.
My problem comes when I try to cudaMemcpy Buffer[] into d_Buffer (which was previously cudaMalloc'd by the main thread, i.e. before thread creation. Buffer[] were also malloc'd by the main thread). I get a "invalid argument" error, but have no idea which is the invalid argument.
I've reduced my program to a single-threaded program, but still using 2 buffers. That is, the copying and processing takes place one after another, instead of simultaneously. The cudaMemcpy line is exactly the same as the double-threaded one. The single-threaded program works fine.
I'm not sure where the error lies.
Thank you.
Regards,
Rayne
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您使用 CUDA 3.2 或更早版本执行此操作,原因是 GPU 上下文绑定到特定线程。如果多线程程序从不同的主机线程在同一 GPU 上分配内存,则分配最终会建立不同的上下文,并且来自一个上下文的指针无法移植到另一个上下文。每个上下文都有自己的“虚拟化”内存空间可供使用。
解决方案是使用上下文迁移 API 在线程工作时将单个上下文从一个线程传输到另一个线程,或者尝试新的公共 CUDA 4.0rc2 版本,它应该支持您在不使用上下文迁移的情况下尝试执行的操作。缺点是 4.0rc2 是一个测试版本,它需要特定的测试版驱动程序。该驱动程序不适用于所有硬件(例如笔记本电脑)。
If you are doing this with CUDA 3.2 or earlier, the reason is that GPU contexts are tied to a specific thread. If a multi-threaded program allocated memory on the same GPU from different host threads, the allocations wind up establishing different contexts, and pointers from one context are not portable to another context. Each context has its own "virtualised" memory space to work with.
The solution is to either use the context migration API to transfer a single context from thread to thread as they do work, or try the new public CUDA 4.0rc2 release, which should support what you are trying to do without the use of context migration. The downside is that 4.0rc2 is a testing release, and it requires a particular beta release driver. That driver won't work will all hardware (laptops for example).