Cuda GPU 优化
我读到,当您使用 NVIDIA GPU 而不是 CPU 时,某些问题的加速速度可达到 100 倍。
在不同问题上使用 cuda 的最佳性能加速时序是什么。
如果可能,请说明问题和加速因子以及论文链接。
i have read that there were 100X acceleration on certain problems when you use NVIDIA GPU instead of CPU.
what are the best performance acceleration timings using cuda on different problems.
please state the problem and the acceleration factor along with links for papers if possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
查看 CUDA 社区展示:http://www.nvidia.com/object/cuda_showcase_html.html
Check out the CUDA community showcase: http://www.nvidia.com/object/cuda_showcase_html.html
Gumerov 能够将拉普拉斯势的 FMM 加速至约 70 倍。您可以在此处阅读他的优秀论文(pdf)。
然而,这样的结果通常毫无意义。例如,Intel Core i7 980 XE 的额定值为 109GFLOPS,而 Nvidia GTX 480 则达到 672GFLOPS。如果两种架构都得到充分利用,可实现的最大加速约为 6 倍。当然,对于某些问题,GPU 的利用率很容易,但 CPU 的利用率却很难。
Gumerov was able to speed up the FMM for the Laplace Potential up to ~70x. You can read his excellent paper here (pdf).
However, such results are usually rather meaningless. For example, the Intel Core i7 980 XE is rated at 109GFLOPS, whereas the Nvidia GTX 480 reaches 672 GFLOPS. If both architectures are fully utilized, the maximum speedup achievable would be about 6 times. Of course, for certain problems it is easy to get a high utilization on the GPU but hard on the CPU.
以下是自然科学中的一些引人注目的例子:
从头算量子化学计算 (TeraChem):高达 50 倍
分子动力学模拟 (HOOMD):高达 32x
使用 VMD 进行分子轨道可视化:20x-100x
更多信息请参见:
http://www.nvidia.com/object/tesla_bio_workbench.html
论文可以在链接中找到。不幸的是我无法展示
更多直接链接,因为我的状态(新帐户)不允许超过
一个超链接。
谢谢。
These are a few striking examples from natural sciences:
Ab initio quantum chemistry calculation (TeraChem): up to 50x
Molecular dynamics simulations (HOOMD): up to 32x
Molecular orbitals visualization with VMD: 20x-100x
More could be found here:
http://www.nvidia.com/object/tesla_bio_workbench.html
Papers are to be found within the link. Unfortunately I could not show
more direct links, since my status (new account) does not permit more than
one hyperlink.
Thanks.