自动查找给定机器上最快的 exe 的编译器选项?
是否有一种方法可以自动找到最佳编译器选项(在给定机器上),从而产生最快的可执行文件?
当然,我使用 g++ -O3
,但还有其他标志可能会使代码运行得更快,例如 -ffast-math
等,其中一些标志取决于硬件。
有谁知道我可以在 configure.ac
文件(GNU autotools)中放入一些代码,以便通过 ./configure
命令将标志自动添加到 Makefile 中?
除了自动确定最佳标志之外,我还对一些有用的编译器标志感兴趣,这些编译器标志非常适合用作大多数优化的可执行文件的默认值。
更新:大多数人建议尝试不同的标志并凭经验选择最好的标志。对于该方法,我有一个后续问题:是否有一个实用程序列出了我正在运行的机器上可能的所有编译器标志(例如测试 SSE 指令是否可用等)?
Is there a method to automatically find the best compiler options (on a given machine), which result in the fastest possible executable?
Naturally, I use g++ -O3
, but there are additional flags that may make the code run faster, e.g. -ffast-math
and others, some of which are hardware-dependent.
Does anyone know some code I can put in my configure.ac
file (GNU autotools), so that the flags will be added to the Makefile automatically by the ./configure
command?
In addition to automatically determining the best flags, I would be interested in some useful compiler flags that are good to use as a default for most optimized executables.
Update: Most people suggest to just try different flags and select the best ones empirically. For that method, I'd have a follow-up question: Is there a utility that lists all compiler flags that are possible for the machine I'm running on (e.g. tests if SSE instructions are available etc.)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我认为您不能在配置时执行此操作,但至少有一个程序尝试在给定特定可执行文件和机器的情况下优化 gcc 选项标志。例如,请参阅 http://www.coyotegulch.com/products/acovea/。
通过对目标机器的一些了解,您也许可以使用它来为您的代码找到一组好的选项。
I don't think you can do this at configure-time, but there is at least one program which attempts to optimize gcc option flags given a particular executable and machine. See http://www.coyotegulch.com/products/acovea/ for example.
You might be able to use this with some knowledge of your target machine(s) to find a good set of options for your code.
嗯 - 是的。这是可能的。查看配置文件引导优化。
Um - yes. This is possible. Look into profile-guided optimization.
一些编译器提供“-fast”选项来自动为给定的编译主机选择最积极的优化。 http://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler
不幸的是,g++ 不提供类似的标志。
作为下一个问题的后续问题,对于 g++,您可以将
-mtune
选项与-O3
一起使用,这将为您提供相当快的默认值。接下来的挑战是找到编译主机的处理器类型。您可能想查看 autoconf 宏存档,看看有人编写了必要的测试。否则,假设是 Linux,你必须解析 /proc/cpuinfo 来获取处理器类型some compilers provide "-fast" option to automatically select most aggressive optimization for given compilation host. http://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler
Unfortunately, g++ does not provide similar flags.
as a follow-up to your next question, for g++ you can use
-mtune
option together with-O3
which will give you reasonably fast defaults. Challenge then is to find processor type of your compilation host. you may want to look on autoconf macro archive, to see somebody wrote necessary tests. otherwise, assuming linux, you have to parse/proc/cpuinfo
to get processor type经过一番谷歌搜索后,我发现了这个脚本:gcccpuopt。
在我的一台机器(32 位)上,它输出:
在另一台机器(64 位)上,它输出:
所以,它并不完美,但可能会有所帮助。
After some googling, I found this script: gcccpuopt.
On one of my machines (32bit), it outputs:
On another machine (64bit) it outputs:
So, it's not perfect, but might be helpful.
另请参阅
-mcpu=native
/-mtune=native
gcc 选项。See also
-mcpu=native
/-mtune=native
gcc options.不。
您可以使用多种编译器选项来编译您的程序,然后对每个版本进行基准测试,然后选择“最快”的版本,但这几乎不可靠,并且可能对您的程序没有用处。
No.
You could compile your program with a large assortment of compiler options, then benchmark each and every version, then select the one that is "fastest," but that's hardly reliable and probably not useful for your program.
这是一个适合我的解决方案,但设置确实需要一些时间。在 Hans Petter Langtangen 所著的《计算科学的 Python 脚本》(我认为这是一本优秀的书)中,给出了一个使用简短的 Python 脚本进行数值实验以确定 C/Fortran/... 的最佳编译器选项的示例。程序。这在第 1.1.11 章“嵌套异构数据结构”中进行了描述。
书中示例的源代码可在 http://folk.uio.no 处免费获取/hpl/scripting/index.html (我不确定许可证,因此不会在这里重现任何代码),特别是您可以在 TCSE3-3rd 的代码中找到类似数值测试的代码-examples.tar.gz 文件中的 src/app/wavesim2D/F77/compile.py ,您可以将其用作编写适合特定系统/语言(在您的情况下为 C++)的脚本的基础。
This is a solution that works for me, but it does take a little while to set up. In "Python Scripting for Computational Science" by Hans Petter Langtangen (an excellent book in my opinion), an example is given of using a short python script to do numerical experiments to determine the best compiler options for your C/Fortran/... program. This is described in Chapter 1.1.11 on "Nested Heterogeneous Data Structures".
Source code for examples from the book are freely available at http://folk.uio.no/hpl/scripting/index.html (I'm not sure of the license, so will not reproduce any code here), and in particular you can find code for a similar numerical test in the code in TCSE3-3rd-examples.tar.gz in the file src/app/wavesim2D/F77/compile.py , which you could use as a base for writing a script which is appropriate for a particular system/language (C++ in your case).
优化您的应用程序主要是您的工作,而不是编译器的工作。
这是我正在谈论的内容的示例。
一次您已经做到了这一点,如果您的应用程序受计算限制,并且代码中存在热点(而不是库代码中),那么编译器对速度的优化将会产生一些影响,因此您可以尝试不同的标志组合。
Optimizing your app is mainly your job, not the compiler's.
Here's an example of what I'm talking about.
Once you've done that, IF your app is compute-bound, with hotspots in your code (not in library code) THEN the compiler optimizations for speed will make some difference, so you can try different flag combinations.