将 cuda 上下文传递给工作线程
我有一些 CUDA 内核想要在单独的 pthread 中运行。
我基本上必须让每个 pthread 执行,比如说 3 个 cuda 内核,并且它们必须按顺序执行。
我想我会尝试向每个 pthread 传递对流的引用,因此这 3 个 cuda 内核中的每一个都将在同一个流中按顺序执行。
我可以在 pthread 的不同上下文中使用它,然后它会正常执行内核,但这似乎需要大量开销。
那么如何使每个 pthread 在同一上下文中与其他 pthread 并发工作?
谢谢
I have some CUDA kernels I want to run in individual pthreads.
I basically have to have each pthread execute, say, 3 cuda kernels, and they must be executed sequentially.
I thought I would try to pass each pthread a reference to a stream, and so each of those 3 cuda kernels would all execute sequentially, in the same stream.
I could get this working with a different context for pthread, which would then execute the kernels as normal, but that seems to take a lot of overhead.
So how do I make each pthread work in the same context, concurrently with the other pthreads?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在 CUDA 4.0 之前,从不同 CPU 线程访问给定上下文的方法是使用 cuCtxPopCurrent()/cuCtxPushCurrent()。上下文一次只能是一个 CPU 线程的当前上下文。
在 CUDA 4.0 中,您可以在每个 pthread 中调用 cudaSetDevice(),并且它一次可以针对多个线程。
内核调用将由上下文按照收到的顺序进行序列化,但您可能必须执行 CPU 线程同步以确保按所需的顺序提交工作。
Before CUDA 4.0, the way to access a given context from different CPU threads was to use cuCtxPopCurrent()/cuCtxPushCurrent(). A context could only be current to one CPU thread at a time.
In CUDA 4.0, you can call cudaSetDevice() in each pthread and it can be current to more than one thread at a time.
The kernel invocations will be serialized by the context in the order received, but you may have to perform CPU thread synchronization to make sure the work is submitted in the order desired.