英特尔 MKL 与 AMD 数学核心库
有没有人有英特尔数学内核库和AMD 数学核心库?我正在构建一台用于高性能统计计算的个人计算机,并正在讨论要购买的组件。 AMD Math Core 库的一个吸引力在于它是免费的,但我在学术界,所以 MKL 并没有那么贵。但我有兴趣听到以下想法:
- 哪个提供了更好的 API?
- 平均而言,每美元提供更好的性能,包括许可和硬件成本。
- AMCL-GPU 是我应该考虑的因素吗?
Does anybody have experience programming for both the Intel Math Kernel Library and the AMD Math Core Library? I'm building a personal computer for high performance statistical computations and am debating on the components to buy. An appeal of the AMD Math Core library is that it is free, but I am in academia so the MKL is not that expensive. But I'd be interested in hearing thoughts on:
- Which provides a better API?
- Which provides better performance, on average, per dollar, including licensing and hardware costs.
- Is the AMCL-GPU a factor I should consider?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
英特尔 MKL 和 ACML 具有类似的 API,但 MKL 具有更丰富的支持功能,包括 BLAS(和 CBLAS)/LAPACK/FFT/向量和统计数学/稀疏直接和迭代求解器/稀疏 BLAS 等。英特尔 MKL 还针对英特尔和 AMD 处理器进行了优化,并拥有一个活跃的用户论坛,您可以向其寻求帮助或指导。此处发布了对这两个库的独立评估:(http://www.advancedclustering.com/company-blog/high-performance-linpack-on-xeon-5500-v-opteron-2400.html)
• Shane Corder,高级集群(也由 HPCWire 进行:基准挑战:Nehalem 与 Istanbul):“在我们最近的测试和现实世界的经验中,我们发现英特尔编译器和英特尔数学核心函数库 (MKL) 通常提供最佳性能。我们不只是选择英特尔的工具包,而是尝试了各种编译器,包括:英特尔、GNU 编译器和 Portland Group。我们还测试了各种线性代数库,包括:MKL、AMD Core Math Library (ACML) 和来自德克萨斯大学的 libGOTO。所有测试都表明,使用英特尔编译器和英特尔数学库时,即使在 AMD 系统上,我们也能实现最高性能,因此我们将它们用作基准测试的基础。” [基准测试显示,4 核 Nehalem X5550 2.66GHz,74.0GFs,而 Istanbul 2435 2.6GHz,99.4GFs;尽管核心增加了 50%,但伊斯坦布尔仅快了 34%]
希望这会有所帮助。
Intel MKL and ACML have similar APIs but MKL has a richer set of supported functionality including BLAS (and CBLAS)/LAPACK/FFTs/Vector and Statistical Math/Sparse direct and iterative solvers/Sparse BLAS, and so on. Intel MKL is also optimized for both Intel and AMD processors and has an active user forum you can turn to for help or guidance. An independent assessment of the two libraries is posted here: (http://www.advancedclustering.com/company-blog/high-performance-linpack-on-xeon-5500-v-opteron-2400.html)
• Shane Corder, Advanced Clustering, (also carried by HPCWire: Benchmark Challenge: Nehalem Versus Istanbul): “In our recent testing and through real world experience, we have found that the Intel compilers and Intel Math Kernel Library (MKL) usually provide the best performance. Instead of just settling on Intel's toolkit we tried various compilers including: Intel, GNU compilers, and Portland Group. We also tested various linear algebra libraries including: MKL, AMD Core Math Library (ACML), and libGOTO from the University of Texas. All of the testing showed we could achieve the highest performance when using both the Intel Compilers and Intel Math Library--even on the AMD system--so these were used them as the base of our benchmarks.” [Benchmark testing showed 4-core Nehalem X5550 2.66GHz at 74.0GFs vs. Istanbul 2435 2.6GHz at 99.4GFs; Istanbul only 34% faster despite 50% more cores]
Hope this helps.
事实上,ACML 中有两个版本的 LAPACK 例程。后面没有下划线 (_) 的是 C 版本例程,正如 Victor 所说,不需要工作区数组,您可以只传递值而不是参数的引用。然而,带有下划线的只是普通的 Fortran 例程。对 libacml_dll.dll 执行“dumpbin /exports”,您就会看到。
In fact, there are two versions of LAPACK routines in ACML. The ones without trailing underscore (_) are the C-version routines, which as Victor said, don't require workspace arrays and you can just pass values instead of references for the parameters. The ones with the underscore however are just vanilla Fortran routines. Do a "dumpbin /exports" on libacml_dll.dll and you'll see.
我已将 AMCL 用于其 BLAS/LAPACK 例程,因此这可能无法回答您的问题,但我希望它对某人有用。将它们与普通 BLAS/LAPACK 相比,在我的特定用例中,它们的性能要好 2-3 倍。我将它用于密集非对称复矩阵、线性求解和特征系统计算。您应该知道函数声明与普通例程不同相同。这需要大量的预处理器宏才能让我在两者之间自由切换。特别是 AMCL 中的所有 LAPACK 例程都不需要工作数组。如果 AMCL 是您将使用的唯一库,那么这是一个很大的便利。
I have used AMCL for its BLAS/LAPACK routines, so this will probably not answer your question, but I hope it's useful for someone. Comparing them to vanilla BLAS/LAPACK, their performance was a factor of 2-3 better in my particular use case. I used it for dense nonsymmetric complex matrices, for both linear solves and eigensystem computations. You should know that the function declarations are not identical to the vanilla routines. This required a substantial amount of preprocessor macros to allow me to freely switch between the two. In particular all LAPACK routines in AMCL do not require work arrays. This is a major convenience if AMCL is the only library you will use.