CUDA 中的流有最大数量吗?
CUDA 中可以创建的流有最大数量吗?
为了澄清,我的意思是 CUDA 流,就像流中允许您执行内核和内存操作一样。
Is there a maximum number of streams that can be created in CUDA?
To clarify I mean CUDA streams as in the stream that allows you to execute kernels and memory operations.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以创建的流数量没有实际限制(至少 1000 个)。但是,可以有效使用来实现并发的流数量是有限的。
在Fermi中,该架构支持16路并发内核启动,但从主机到GPU只有一个连接。因此,即使您有 16 个 CUDA 流,它们最终也会汇集到一个硬件队列中。这可能会产生错误的数据依赖性,并限制可以轻松获得的并发量。
使用 Kepler,主机和 GPU 之间的连接数量现在为 32 个(而使用 Fermi 时为 1 个)。借助新的 Hyper-Q 技术,现在可以更轻松地让 GPU 忙于并发工作。
There is no realistic limit to the number of streams you can create (at least 1000's). However, there's a limit to the number of streams you can use effectively to achieve concurrency.
In Fermi, the architecture supports 16-way concurrent kernel launches, but there is only a single connection from the host to the GPU. So even if you have 16 CUDA streams, they'll eventually get funneled into one HW queue. This can create false data-dependencies, and limit the amount of concurrency one can easily get.
With Kepler, the number of connections between the Host and the GPU is now 32 (instead of one with Fermi). With the new Hyper-Q technology, it is now much easier to keep the GPU busy with concurrent work.
我没有在任何文档中看到限制,但这并不意味着所有流都会同时执行,因为这是硬硬件限制(多处理器、寄存器等)。
I haven't seen a limit in any documentation, but that doesn't mean all streams will execute concurrently, since that is a hard hardware limit (Multiprocessors, registers, etc).
根据 NVIDIA 演示,最大为 16 个流(在 Fermi 上)。
http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf
澄清一下,我已经成功创建了超过16个流,但我认为硬件只能支持16个并发内核,所以多余的在并发方面被浪费了。
开普勒可能有所不同。
According to this NVIDIA presentation, max is 16 streams (on Fermi).
http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf
To clarify, I've successfully created more than 16 streams, but I think the hardware can only support 16 concurrent kernels, so the excess ones are wasted in terms of concurrency.
Kepler is probably different.