需要参考:硬件架构与性能提升【HPC/并行计算】
有多种方法可以提高 HPC 应用程序的性能。其中一种方法是根据硬件架构对应用程序进行微调。这种微调主要是在多核架构上完成的。为了使用这种方法,我们应该真正了解底层硬件架构,例如内存、插槽数量、每个插槽的核心数量、L1/L2 缓存、GFlops 等......
尽管这些技术术语看起来很熟悉,我仍然不清楚它对应用程序性能的确切含义。
任何人都可以推荐一个好地方/书,让我可以从性能方面了解硬件架构。
There are several ways/method to improve the performance of the HPC applications. One of the method is to fine tune the application based on the hardware architecture. This kind of fine tuning is mostly done on multicore architecture. In order to use this method, one should really understand the underlying hardware architecture such as memory, no.of sockets, no.of cores per socket, L1/L2 cache, GFlops, etc...
Even though these technical terms looks familiar, I still don't have a clear understanding of what exactly it means in terms of the performance of the application.
Can anyone suggest a good place/book from where I can understand the hardware architecture in terms of the performance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
将代码调整到目标硬件架构非常重要。然而,除非您有大量的时间和资源,否则对于各种各样的可用系统来说这是不可能的。
优化遵循 80-20 规则。你用20%的努力获得80%的收益。除此之外,你的回报将开始减少。
这是我遵循的过程:
1) 获取最适合您的目标架构的编译器。有时,GNU 可能是特定平台的最佳编译器,请不要感到惊讶。
2) 通读编译器的“代码优化”部分。
3) 确定正确的标志来为目标平台生成最佳代码。但是,请确保您尝试的每个优化级别都验证了代码的结果。较高的优化级别会影响代码的正确性。
4) 确保您需要的任何库都针对该系统进行了优化。例如,数学库、BLAS 库等。
5) 特别注意平台特定的硬件功能,例如 SSE (SIMD)、内核或加速器的数量。您可能需要修改代码或向编译器提供提示,以便更好地针对这些功能优化代码。
您必须对每个目标平台执行此操作。此时,您应该会看到以最小的努力获得最大的收益。
如果您需要获得更多性能,它几乎总是要求您重写代码以确保充分利用硬件功能。
不,没有这方面的书籍。最接近的是“优化手册”,通常由供应商免费提供(IBM 红皮书、Intel、AMD、Cray)。
前任:
support.amd.com/us/Processor_TechDocs/25112.PDF
http://www.intel.com/products/processor/manuals/
http://www.ibm.com/developerworks/wikis/download/attachments/137167333/Power6_optimization.pdf?version=1
这些是这些平台最有效的资源。您应该致力于为您的目标平台找到此类资源。
It is very important to tune the code to the target hardware architecture. However, unless you have lots of time and resources, this is impossible to do for the wide variety of systems available.
Optimization follows the 80-20 rule. You get 80% benefit with 20% of effort. Beyond that, your returns will start to diminish.
Here is the process I follow:
1) Obtain the best compiler for your target architecture. Sometimes GNU maybe the best compiler for a particular platform, dont be surprised.
2) Read through the "code optimization" section for the compiler.
3) Identify the right flags to generate the best code for the target platform. However, make sure you validate the results of the code with every level of optimization you try. Higher optimization levels will affect the correctness of the code.
4) Make sure any libraries you need are optimized for that system. For ex, math libraries, BLAS libraries etc.
5) Pay special attention to platform specific hardware features, like SSE (SIMD), number of cores or accelerators. YOu may need to modify your code or provide hints to the compiler to optimize the code better for these features.
You will have to do this for every target platform. By this time you should see the maximum benefit with minimal effort.
If you need to extract more performance, it almost always demands you rewrite your code to make sure the hardware features are fully exploited.
No, there are no books for this. The closest is "optimization manuals" generally provided free of cost by the vendor (IBM redbooks, Intel, AMD, Cray).
Ex:
support.amd.com/us/Processor_TechDocs/25112.PDF
http://www.intel.com/products/processor/manuals/
http://www.ibm.com/developerworks/wikis/download/attachments/137167333/Power6_optimization.pdf?version=1
These are the most valid resources for these platforms. You should aim to find out such resources for your target platform.