SIMD 上可容纳的波前数量
我正在阅读一篇有关 AMD GPU 的文章,并对一个特定的示例感到困惑。给定一个具有多个寄存器的 SIMD 单元,如果需要 x 个寄存器,那么有多少个波前可以占用一个 SIMD?
具体来说,如果一个 SIMD 单元有 16k 个寄存器在 1-32 个波前之间共享。那么这意味着每个波前平均可以有 8 个寄存器(如果有 32 个波前)。这很好。
然后它接着说,SIMD 上的波前数量存在全局限制,约为 20.6,这将为每个波前提供 11-12 个寄存器。
这部分让我感到困惑。它接着说,如果使用 83 个或更多寄存器,则只有 2 个波前可以占用一个 SIMD。 (回想一下波前是 64 宽)。
在我的计算中: 2 * 83 * 64 = 10628 个寄存器
这远低于每个 SIMD 给出的 16,384 个寄存器。因此,您可以拥有 3 个波前,没有问题。
我正在此处阅读文章,如果有什么我想要的已经错过了。 (第七段)
I'm reading an article about an AMD GPU and am confused by a particular example. Given a SIMD unit with a number of registers, how many wavefronts can occupy a SIMD if they require x amount of registers?
Specifically, if a SIMD unit has 16k registers to share between 1-32 wavefronts. Then this implies that each wavefront can have an average of 8 registers (if there are 32 wavefronts). This is fine.
It then goes on to say that there is a global limit to the number of wavefronts on the SIMD of ~20.6 which would then give each wavefront 11-12 registers.
This part then confuses me. It goes on to say that only 2 wavefronts can occupy a SIMD if they use 83 or more registers. (recalling that wavefronts are 64 wide).
In my calculations:2 * 83 * 64 = 10628 registers
which is way under the 16,384 given per SIMD. You could therefore have 3 wavefronts no problem.
I'm reading the article here if there is something I've missed. (7th paragraph)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
关于全局限制:
每个 amd gpu 都有其可以支持的同时波前数量的全局限制。此限制是特定于型号的,但通常在同一芯片的不同切割版本之间不会改变。例如,对于 cypress 芯片(5830、5850、5870),每个 GPU 有 496 个波前。由于这些芯片具有不同数量的 CU,波前/CU 的最大数量(根据此约束计算)从 5830 的 35.4 下降到 5870 的 24.8。对于入门级芯片,此全局限制可以计算为高达 96 个波前/ CU。在这些情况下,32 个波前/CU(8 个工作组,4 个波前)的限制适用于 8 个寄存器/线程。
现在对于 2 个波前:
从 ATI Stream 编程指南 OpenCL 中给出的数字来看,可用寄存器的数量似乎略低于 16384,所以我猜测(纯粹是猜测,还没有找到任何有关该信息的信息) )一些寄存器用于内核无法直接访问的其他目的(指令指针等)。在给出的表中,没有分配使用超过 15872 个寄存器,因此这可能是可用的最大值。当然,这纯粹是猜测,因此可能只是有人在手册中使用了错误的数字而每个人都抄袭了它。
一般来说ATI Stream 编程手册 OpenCL 是了解此内容的一个很好的资源。请注意,该链接的结果是快速谷歌搜索的结果,似乎并不指向最新版本(它指向版本 1.03,而我使用的是版本 1.05,我不知道这是否是最新版本)最新的)。不知道这是否会产生任何重要的区别,但可能需要进行更深入的搜索。
Concerning the global limit:
Each of the amd gpus has a global limit of how many simultaneous wavefronts it can sustain. This limit is model specific, but generally doesn't change between differently cut versions of the same chip. For example for cypress chips (5830, 5850, 5870) it's 496 wavefronts per GPU. Since those chips have different numbers of CUs the maximum number of wavefronts/CU (as calculated by this constraint) goes from 35.4 for 5830 down to 24.8 for 5870. For entry level chips this global limit can calculate to values as high as 96 wavefronts/CU. In these cases the limit of 32 wavefronts/CU (8 workgroups a 4 wavefronts) applies with 8 registers/thread.
Now for the 2 wavefronts:
Judging from the numbers given in the ATI Stream Programming Guide OpenCL it seems that the number of usable registers is slightly lower then 16384, so I would guess (as in pure speculation, haven't found any information about that) some registers are used for other purposes not directly accessible by the kernel (Instruction Pointers and whatnot). In the table given there no allocation uses more then 15872 registers so that might be the usable maximum. Of course this is pure speculation, so it might simply be a case of someone using the wrong numbers in the manual and everyone copying it.
In general the ATI Stream Programming Manual OpenCL is a good resource to learn about this. Be advised though that the link is the result is the result of a quick google search and doesn't seem to point to the most current version (it points to rev 1.03 while I am using rev 1.05 and I have no idea if that is the most current either). Don't know if that makes any important difference, but a more indepth search might be in order.