将代码编译为单个自动合并文件,以允许编译器更好地优化代码
假设您有一个使用“编译对象然后链接它们”方案的 C、C++ 或任何其他语言的程序。
当你的程序不小时,很可能会破坏几个文件,以方便代码管理(并缩短编译时间)。此外,在一定程度的抽象之后,您可能会拥有很深的调用层次结构。尤其是在最低级别,任务最重复、最频繁,您希望强加一个通用框架。
但是,如果您将代码分成不同的目标文件,并为代码使用非常抽象的架构,则可能会影响性能(如果您或您的主管强调性能,这会很糟糕)。
避免这种情况的一种方法可能是广泛内联 - 这是模板元编程的方法:在每个翻译单元中,您包含通用、灵活结构的所有代码,并依靠编译器来解决性能问题。我想做一些没有模板的类似事情 - 比如,因为它们太难处理或者因为你使用纯 C。
你可以将所有代码写入一个文件中。那太可怕了。编写一个脚本,将所有代码合并到一个源文件中并编译它怎么样?要求你的源文件不要写得太乱。然后编译器可能会应用更多的优化(内联、死代码消除、编译时算术等)。
您对这种“伎俩”有什么经验或反对意见吗?
suppose you have a program in C, C++ or any other language that employs the "compile-objects-then-link-them"-scheme.
When your program is not small, it is likely to compromise several files, in order to ease code management (and shorten compilation time). Furthermore, after a certain degree of abstraction you likely have a deep call hierarchy. Especially at the lowest level, where tasks are most repetitive, most frequent you want to impose a general framework.
However, if you fragment your code into different object files and use a very abstract archictecture for your code, it might inflict performance (which is bad if you or your supervisor emphasizes performance).
One way to circuvent this is might be extensive inlining - this is the approach of template meta-programming: in each translation unit you include all the code of your general, flexible structures, and count on the compiler to counteract performance issues. I want to do something similar without templates - say, because they are too hard to handle or because you use plain C.
You could write all your code into one single file. That would be horrible. What about writing a script, which merges all your code into one source file and compiles it? Requiring your source files are not too wildly written. Then a compiler could probably apply much more optimization (inlining, dead code elamination, compile-time arithmetics, etc.).
Do you Have any experience with or objections against this "trick"?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
对于现代编译器来说毫无意义。 MSVC、GCC 和 clang 都支持链接时代码生成(GCC 和 clang 称之为“链接时优化”),这正是实现这一点的原因。另外,将多个翻译单元组合成一个大单元会使您无法并行编译过程,并且(至少在 C++ 的情况下)使 RAM 使用量激增。
这不是一个功能,并且与性能没有任何关系。这是编译器和包含系统的一个令人烦恼的限制。
Pointless with a modern compiler. MSVC, GCC and clang all support link-time code generation (GCC and clang call it 'link-time optimisation'), which allows for exactly this. Plus, combining multiple translation units into one large makes you unable to parallelise the compilation process, and (at least in case of C++) makes RAM usage go through the roof.
This is not a feature, and it's not related to performance in any way. It's an annoying limitation of compilers and the include system.
这是一种半有效的技术,iirc KDE 在大多数人拥有一个 cpu 核心的时代就曾使用它来加速编译。但有一些警告,如果您决定做这样的事情,您需要在编写代码时牢记这一点。
需要注意的一些示例:
namespace { int x; };
在两个源文件中。using namespace foo;
在 .cpp 文件中可以正常 - 附加的源可能不同意static int i;
在多个 cpp 文件的文件范围内会导致问题。#define
- 会影响不希望出现的源文件现代编译器/链接器完全能够跨翻译单元进行优化(链接时代码生成) - 我不认为我认为使用这种方法您会看到任何明显的差异。
This is a semi-valid technique, iirc KDE used to use this to speed up compilation back in the day when most people had one cpu core. There are caveats though, if you decide to do something like this you need to write your code with it in mind.
Some samples of things to watch out for:
namespace { int x; };
in two source files.using namespace foo;
in a .cpp file can be OK - the appended sources may not agreestatic int i;
at file scope in several cpp files will cause problems.#define
's in .cpp files - will affect source files that don't expect itModern compilers/linkers are fully able to optimize across translation units (link-time code generation) - I don't think you'll see any noticeable difference using this approach.
最好分析代码的瓶颈,并仅在适当的情况下应用内联和其他速度技巧。优化应该使用手术刀而不是霰弹枪来执行。
It would be better to profile your code for bottlenecks, and apply inlining and other speed hacks only where appropriate. Optimization should be performed with a scalpel, not with a shotgun.
尽管不建议,但对 C 文件使用 #include 语句本质上与将包含文件的全部内容附加到当前文件中相同。
这样,如果您将所有文件包含在一个“主文件”中,该文件实际上将被编译,就好像所有源代码都附加在其中一样。
Though it is not suggested, using #include statements for C files is essentially the same as appending the entire contents of the included file in the current one.
This way, if you include all of your files in one "master file" that file will be essentially compile as if all the source code were appended in it.
SQlite 通过其 Amalgamation 源文件来做到这一点,请看一下:
http://www.sqlite.org/amalgamation.html
SQlite does that with its Amalgamation source file, have a look at:
http://www.sqlite.org/amalgamation.html
您介意我分享一些关于软件变慢的经验吗,尤其是当调用树变得茂密时?进入和退出函数的成本几乎完全微不足道除了对于
执行很少计算并且(特别是)不调用任何其他函数,
并且实际上在很大一部分时间中都在使用(即程序计数器的随机时间样本实际上在函数中的时间有 10% 或更多)。< /p>
因此,内联仅有助于某种功能的性能。
然而,您的主管可能是对的,具有抽象层的软件存在性能问题。
这不是因为进入和离开函数所花费的周期。
这是因为在编写函数调用时没有真正意识到它们需要多长时间的诱惑。
函数有点像信用卡。它恳求被使用。因此,使用信用卡比不使用信用卡时花费更多也就不足为奇了。
然而,函数的情况更糟,因为函数调用函数调用函数,在很多层上,并且超支的复合呈指数级增长。
如果您有这样的性能调优经验,那么您就来了识别导致性能问题的设计方法。我一遍又一遍地看到的是太多的抽象层、过多的通知、过度设计的数据结构等等。
Do you mind if I share some experience about what makes software slow, especially when the call tree gets bushy? The cost to enter and exit functions is almost totally insignificant except for functions that
do very little computation and (especially) do not call any further functions,
and are actually in use for a significant fraction of the time (i.e. random-time samples of the program counter are actually in the function for 10% or more of the time).
So in-lining helps performance only for a certain kind of function.
However, your supervisor could be right that software with layers of abstraction have performance problems.
It's not because of the cycles spent entering and leaving functions.
It's because of the temptation to write function calls without real awareness of how long they take.
A function is a bit like a credit card. It begs to be used. So it's no mystery that with a credit card you spend more than you would without it.
However, it's worse with functions, because functions call functions call functions, over many layers, and the overspending compounds exponentially.
If you get experience with performance tuning like this, you come to recognize the design approaches that result in performance problems. The one I see over and over is too many layers of abstraction, excess notification, overdesigned data structure, stuff like that.