CUDA 中的流有最大数量吗?

发布于 2024-09-16 06:25:50 字数 72 浏览 5 评论 0原文

CUDA 中可以创建的流有最大数量吗?

为了澄清,我的意思是 CUDA 流,就像流中允许您执行内核和内存操作一样。

Is there a maximum number of streams that can be created in CUDA?

To clarify I mean CUDA streams as in the stream that allows you to execute kernels and memory operations.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

寻找一个思念的角度 2024-09-23 06:25:50

您可以创建的流数量没有实际限制(至少 1000 个)。但是,可以有效使用来实现并发的流数量是有限的。

在Fermi中,该架构支持16路并发内核启动,但从主机到GPU只有一个连接。因此,即使您有 16 个 CUDA 流,它们最终也会汇集到一个硬件队列中。这可能会产生错误的数据依赖性,并限制可以轻松获得的并发量。

使用 Kepler,主机和 GPU 之间的连接数量现在为 32 个(而使用 Fermi 时为 1 个)。借助新的 Hyper-Q 技术,现在可以更轻松地让 GPU 忙于并发工作。

There is no realistic limit to the number of streams you can create (at least 1000's). However, there's a limit to the number of streams you can use effectively to achieve concurrency.

In Fermi, the architecture supports 16-way concurrent kernel launches, but there is only a single connection from the host to the GPU. So even if you have 16 CUDA streams, they'll eventually get funneled into one HW queue. This can create false data-dependencies, and limit the amount of concurrency one can easily get.

With Kepler, the number of connections between the Host and the GPU is now 32 (instead of one with Fermi). With the new Hyper-Q technology, it is now much easier to keep the GPU busy with concurrent work.

傲世九天 2024-09-23 06:25:50

我没有在任何文档中看到限制,但这并不意味着所有流都会同时执行,因为这是硬硬件限制(多处理器、寄存器等)。

I haven't seen a limit in any documentation, but that doesn't mean all streams will execute concurrently, since that is a hard hardware limit (Multiprocessors, registers, etc).

土豪 2024-09-23 06:25:50

根据 NVIDIA 演示,最大为 16 个流(在 Fermi 上)。
http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf

澄清一下,我已经成功创建了超过16个流,但我认为硬件只能支持16个并发内核,所以多余的在并发方面被浪费了。

开普勒可能有所不同。

According to this NVIDIA presentation, max is 16 streams (on Fermi).
http://developer.download.nvidia.com/CUDA/training/StreamsAndConcurrencyWebinar.pdf

To clarify, I've successfully created more than 16 streams, but I think the hardware can only support 16 concurrent kernels, so the excess ones are wasted in terms of concurrency.

Kepler is probably different.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文