GPU 中的活动扭曲数量 (Fermi)
我有一个关于 GPU 中的活动扭曲的快速问题(我更愿意在费米中知道它)。 对于特定的内核,SM中任何周期的活动warp数量在内核的整个执行时间内是否相同? 正如我所试验的,活动扭曲总数(对于整个执行)和程序内核中的同步数量之间存在一些相关性。谁能澄清这个关系? 谢谢
I have a quick question about the active warps in GPU (I would prefer to know it in Fermi).
For specific kernel, is the number of active warps at any cycle in a SM the same for the whole execution time of the kernel?
As I experimented, there is some correlation between the total number of active warps (for the whole execution) and the number of synchronizations in the program kernel. Can anyone clarify this relation?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
活动扭曲的数量可能会随着时间的推移而变化,因为:
整个程序执行的活动 warp 计数将取决于许多因素,但请记住,它是按活动 warp 的数量递增的。每个周期。这意味着如果增加同步数量,这也会增加每个 warp 执行内核所需的周期数,那么您会期望更高的活动 warp 计数。
另请注意,分析器中的一些派生统计数据是近似值,因为它们通常使用来自多次运行的值,因此可能存在一些可变性。
The number of active warps can vary over time since:
The active warps count for a whole program execution would depend on a number of factors, but remember that it is incremented by the number of active warps on each cycle. This means if you increase the number of syncs, which would also increase the number of cycles each warp requires to execute the kernel, then you would expect a higher active warps count.
Also note that some derived statistics in the profiler are approximate since they often use values from more than one run, hence there can be some variability.
本文解释了barrier同步和wrap之间的关系,
通过微基准测试揭秘 GPU 微架构。
The relationship between the barrier synchronization and wrap is explained in this paper,
Demystifying GPU Microarchitecture through Microbenchmarking.