如何确定内核是否受内存限制或计算限制?
我认为我的内核受内存限制(因为大多数 GPGPU 代码受内存限制),但我实际上并不确定。我怎样才能自己找到它。可能必须使用视觉分析器,因为它取决于所使用的 GPU。
如果 CUDA 编程指南或其他 NVIDIA 文档中对此进行了解释,请毫不犹豫地发布带有页码的链接,以便我可以自己阅读。
澄清
我更喜欢如何确定限制因素的一般“规则”,但在我的特殊情况下,您可以在这里找到有关我的内核的详细信息:使用“overlap”、“kernel time”和“utilization”来优化内核
I think my kernel is memory bound (because most GPGPU code is memory bound), but I don't actually know for sure. How can I found it out for myself. Probably one has to use the visual profiler, as it depends on the used GPU.
If it is explained in the CUDA Programming guide or in other NVIDIA documentation, don't hesitate to just post a link with a page number, so I can read it up for myself.
Clarification
I would prefer are general "rule" how to determine the limiting factor, but in my special case you can find details about my kernel here: Using `overlap`, `kernel time` and `utilization` to optimize one's kernels
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
NVIDIA 的演示讨论了选择性禁用内存访问和算术通过修改源代码在您的内核中进行更改,以确定其中之一是否限制了您的性能。
This presentation from NVIDIA talks about selectively disabling memory accesses and arithmetic in your kernel by modifying your source code, in order to determine if one of them is limiting your performance.
一个无需任何源代码修改的好技巧可用于使用计算能力 2.0 及以上编译的代码(基于 在这里回答)
使用“--use_fast_math”标志可以轻松增加\减少计算压力。
如果设置此标志可提供较大的加速,则这将指示计算限制内核。
如果设置此标志几乎没有加速,这将表明平衡\内存限制内核。
A nice trick without any source code modification can be used for code compiled with compute capability 2.0 and above ( based on answer here )
using the "--use_fast_math" flag one can easily increase\decrease compute pressure.
if setting this flag gives a large speed-up, this would indicate a compute bound kernel.
if setting this flag gives little to no speed-up, this would indicate a balanced\memory bound kernel.
我想我会提出一个答案,即使有一个公认的答案,而且这个问题已经很老了。
我的代码中也有类似的问题,尽管当时我并不知道。
我运行了 Nvidia Visual Profiler (
nvvp
) 并分析了我的程序。我发现分析器检测到我的程序在某些方面受到限制,并提出了一些建议。如果您不确定从哪里开始,这是一个很好的工具。
I though I would pitch in an answer even though there is an accepted answer and this question is old.
I had a similar problem in my code, although at the time I didn't know it.
I ran the Nvidia Visual Profiler (
nvvp
) and analysed my program. I found that the profiler had detected my program was limited in some fashion and had some recommendations.A great tool to use if you are unsure on where to begin.