如何评估 CUDA GPU 的相对性能?
我怎样才能估计我不拥有的卡的cuda性能,即。新卡?
例如,我发现了一个不完整的 Cuda 示例,作者写道,他在 GF 8600 GT 上花费了 0.7 秒。但在我的 Quadro 上需要 1.7 秒。
我的问题是:我用来填补空白的代码是否有问题,或者 GF 8600 的速度真的是两倍吗?
内核受内存限制,但我的卡具有更高的内存带宽。我不知道从中可以得出什么结论。
Name Quadro FX 580 GeForce 8600 GT
CUDA Cores 32 32
Core clock (MHz) 450 540
Memory clock (MHz) 400 700
Memory BW (GB/s) 25.6 22.4
Shader Clock (MHz) ???? 1180
How can I estimate the cuda performance of cards that I don't own, ie. new cards?
For instance I found an incomplete Cuda example and the author wrote, that it takes him 0,7 s on his GF 8600 GT. But on my Quadro it takes 1,7s.
My question is: Is the code which I used to fill the gaps faulty or is the GF 8600 really twice as fast?
The kernel is memory bound, but my card has an higher memory bandwidth. I don't know what conclusions to draw from this.
Name Quadro FX 580 GeForce 8600 GT
CUDA Cores 32 32
Core clock (MHz) 450 540
Memory clock (MHz) 400 700
Memory BW (GB/s) 25.6 22.4
Shader Clock (MHz) ???? 1180
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
只是想为您提供一些可能是错误来源的指示。首先,使用 cudaEvents 对代码进行计时,而不是使用 cuda profiler,因为 cudaEvents 更准确。其次,请检查作者测量的是什么;他只是在谈论计算时间,还是也考虑了与 GPU 之间传输数据的时间。你们测量的是同一时间吗?
其次,cuda架构变化相当快。例如,对于cc 1.x的卡,建议我们使用共享内存以获得更好的性能;然而,对于具有 cc 2.x 的卡,每个多处理器都有一个 L1 缓存,这使得全局内存访问速度相当快。因此,您可能还想比较这两种卡的架构及其计算能力。
Just want to provide you with some pointers that may be possible sources of error. Firstly, use cudaEvents to time your code, not cuda profiler as cudaEvents is more accurate. Secondly, please check what the author is measuring; is he only talking about the computation time, or is he also considering the time to transfer data to and from the GPU. Are you measuring the same time?
Secondly, the cuda architecture is changing quite fast. For example, for cards with cc 1.x, it is suggested that we should use shared memory to get better performance; however, for cards with cc 2.x, there is a L1 cache with each multiprocessor that makes global memory accesses quite fast. So, you may aslo want to compare the architecture of the two cards and their compute capabilities.