“每个多处理器的最大线程数”和“每个多处理器的最大线程数”有什么区别?和“每个块的最大线程数”在设备查询结果中

发布于 2025-01-14 18:13:13 字数 298 浏览 4 评论 0原文

执行设备查询时, 我想知道“每个多处理器的最大线程数”和“每个块的最大线程数”之间的区别。据我了解,sm = multiprocessor = GPU上的块,但我不明白为什么这两个值不同。多处理器中是否有多个块?

  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024

还有一个额外的问题就是thread和core的关系,匹配thread=core是否正确?

When executing device query,
I want to know the difference between "Maximum number of threads per multiprocessor" and "Maximum number of threads per block". As I understood it, sm = multiprocessor = block on the gpu, but I do not understand why the two values are different. Are there multiple blocks in a multiprocessor?

  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024

And an additional question is the relationship between thread and core, is it correct to match thread = core?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

月隐月明月朦胧 2025-01-21 18:13:13

多处理器中是否有多个块?

是的,可以有。

很简单,sm == 多处理器。 sm != block

SM(多处理器)是一个硬件实体。线程块是一个软件实体,基本上是线程的集合。

SM或多处理器可以有超过1个块驻留。为了充分占用最大线程数为 1536 的 SM,您需要驻留三个 512 线程块。

还有一个问题是thread和core的关系,匹配thread = core是否正确?

一个线程代表一个指令序列。 GPU中的“核心”是SM中处理某些指令类型的功能单元,即32位浮点加法、乘法和乘加指令。其他指令类型由SM中的其他(种类)功能单元处理。

当线程需要处理其中一种 32 位浮点指令类型时,该线程将需要一个内核。如果碰巧有不同的指令要处理,例如 LD(加载)指令,则它将需要不同的功能单元,特别是在这种情况/示例中的 LD/ST(加载/存储)单元。

Are there multiple blocks in a multiprocessor?

Yes, there can be.

quite simply, sm == multiprocessor. sm != block

A SM (multiprocessor) is a hardware entity. A threadblock is a software entity, basically a collection of threads.

A SM or multiprocessor can have more than 1 block resident. To get full occupancy of an SM that had 1536 max threads, you would need to have something like three 512-thread blocks resident.

And an additional question is the relationship between thread and core, is it correct to match thread = core?

A thread represents a sequence of instructions. A "core" in GPU speak is a functional unit in the SM which processes certain instruction types, namely 32-bit floating point add, multiply, and multiply-add instructions. Other instruction types are handled by other (kinds of) functional units in the SM.

A thread will require a core when it has one of those 32-bit floating point instruction types to process. If it happens to have a different instruction to process, say a LD (load) instruction, it will require a different functional unit, specifically, a LD/ST (load/store) unit in that case/example.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文