关于计算视觉分析器和用于分析的块数
在《Compute Visual Profiler 用户指南》第 51 页上,它指出:
请注意,如果数字 内核中的块小于或不是多处理器数量的倍数 多次运行的计数器值将不一致。
这是包容性的还是排他性的“或”陈述?它总是必须是倍数吗?
On page 51 of the Compute Visual Profiler User Guide it states that:
Note that in case the number
blocks in a kernel is less than or not a multiple of the number of multiprocessors the
counters values across multiple runs will not be consistent.
Is that an inclusive or exclusive "or" statement? Does it always have to be a multiple?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
文档中提到的不一致是由多处理器之间的负载不平衡引起的。
例如,如果您在具有 14 个多处理器的 Tesla C2050 上运行具有 15 个块的内核,则其中一个多处理器最终将运行来自一个“额外”块的线程。如果分析器恰好在一次分析运行中从运行两个块的线程的多处理器收集数据,但在另一次分析运行中仅运行单个块的线程,则结果显然会有所不同。
为了回答您提出的问题,“或”是包容性的,就像自然语言中通常那样。
尽管我不记得文档中提到过,但我可以想象,即使这些条件都是假的,当数据本身导致不平衡(算术/数据量或内存寻址模式以某些数据为条件)时,也可能会出现分析不一致。
The inconsistency mentioned in the docs is causes by load imbalance between multiprocessors.
For instance, if you are running a kernel with 15 blocks on a Tesla C2050 which has 14 multiprocessors, one of the multiprocessors will end up running threads from the one "extra" block. If the profiler happens to be collecting data from this multiprocessor running threads of two blocks in one profiling run, but from one running only threads from a single block in another one, the results will obviously deviate.
To answer the very question you asked, the "or" is inclusive, as is usually in natural languages.
Although I do not remember being mentioned in the documentation, I can image that even if these conditions are both false, profiling inconsistency can also occur when the data itself causes imbalance (amount of arithmetics/data or memory addressing patters conditional on some data).