由于调试符号而产生巨大的可执行文件,为什么?
我们一直在为一家银行开发一个大型金融应用程序。一开始是 15 万行非常糟糕的代码。到 1 个月前,它已经减少到一半多一点,但可执行文件的大小仍然很大。我预计,由于我们只是使代码更具可读性,但模板化代码仍然生成大量目标代码,因此我们的努力更加高效。
该应用程序分为大约 5 个共享对象和一个 main.其中较大的共享对象之一为 40Mb,并且在代码缩小的情况下增长到了 50 个。
对于代码开始增长,我并不完全感到惊讶,因为毕竟我们正在添加一些功能。但令我惊讶的是,它增长了 20%。当然,没有人能接近编写 20% 的代码,所以我很难想象它是如何增长这么多的。该模块对我来说有点难以分析,但周五,我得到了一个新的数据点,可以提供一些线索。
可能有 10 个到 SOAP 服务器的提要。代码是自动生成的,很糟糕。每个服务都有一个具有完全相同代码的解析器类,类似于:
#include <boost/shared_ptr.hpp>
#include <xercesstuff...>
class ParserService1 {
public:
void parse() {
try {
Service1ContentHandler*p = new Service1ContentHandler( ... );
parser->setContentHandler(p);
parser->parser();
} catch (SAX ...) {
...
}
}
};
这些类完全不必要,单个函数就可以工作。每个 ContentHandler 类都是使用相同的 7 或 8 个变量自动生成的,我可以通过继承来共享这些变量。
因此,当我从代码中删除解析器类和所有内容时,我预计代码的大小会减小。但由于只有 10 个服务,我没想到它会从 38Mb 下降到 36Mb。符号数量多得令人发指。
我唯一能想到的是每个解析器都包含 boost::shared_ptr、一些 Xerces 解析器内容,并且不知何故,编译器和链接器为每个文件重复存储所有这些符号。无论如何我都很想知道。
那么,有人可以建议我如何去追踪为什么像这样的简单修改会产生如此大的影响吗?我可以在模块上使用 nm 来查看内部的符号,但这会生成大量令人痛苦的半可读内容。
此外,当一位同事使用我的新库运行她的代码时,用户时间从 1 分 55 秒缩短到 1 分 25 秒。实时时间变化很大,因为我们正在等待缓慢的 SOAP 服务器(恕我直言,SOAP 是 CORBA 的一个非常糟糕的替代品......),但 CPU 时间相当稳定。我本以为减少代码大小会带来轻微的提升,但最重要的是,在具有大量内存的服务器上,考虑到我没有改变代码的架构,我真的很惊讶速度受到如此大的影响。 XML 处理本身。
我将在周二进一步进行讨论,希望能获得更多信息,但如果有人知道我如何才能取得如此大的进步,我很想知道。
更新: 我验证了事实上,在任务中使用调试符号似乎根本不会改变运行时间。我通过创建一个包含很多内容的头文件来做到这一点,其中包括在这里起作用的两个:boost 共享指针和一些 xerces XML 解析器。似乎没有运行时性能受到影响(我检查过,因为两个答案之间存在意见分歧)。但是,我还验证了包含头文件会为每个实例创建调试符号,即使剥离的二进制大小不变。因此,如果您包含给定的文件,即使您甚至不使用它,也会有固定数量的符号反对该对象,这些符号在链接时不会折叠在一起,即使它们可能是相同的。
我的代码如下所示:
#include "includetorture.h"
void f1()
{
f2(); // call the function in the next file
}
我的特定包含文件的大小约为每个源文件 100k。据推测,如果我加入更多,它会更高。包含包含在内的可执行文件总数约为 600k,不含包含的可执行文件约为 9k。我验证了增长与执行包含操作的文件数量呈线性关系,但无论如何,剥离的代码的大小都是相同的,因为它应该是这样。
显然,我错误地认为这是性能提升的原因。我想我现在已经考虑到了这一点。尽管我没有删除太多代码,但我确实简化了许多大型 xml 字符串处理,并大大减少了代码路径,这可能就是原因。
We have been developing a large financial application at a bank. It started out being 150k lines of really bad code. By 1 month ago, it was down to a little more than half that, but the size of the executable was still huge. I expected that as we were just making the code more readable, but the templated code was still generating plenty of object code, we were just being more efficient with our effort.
The application is broken into about 5 shared objects and a main. One of the bigger shared objects was 40Mb and grew to 50 even while the code shrank.
I wasn't entirely surprised that the code started to grow, because after all we are adding some functionality. But I was surprised that it grew by 20%. Certainly no one came close to writing 20% of the code, so it's hard for me to imagine how it grew that much. That module is kind of hard for me to analyze, but on Friday, I have a new datapoints that sheds some light.
There are perhaps 10 feeds to SOAP servers. The code is autogenerated, badly. Each service had one parser class with exactly the same code, something like:
#include <boost/shared_ptr.hpp>
#include <xercesstuff...>
class ParserService1 {
public:
void parse() {
try {
Service1ContentHandler*p = new Service1ContentHandler( ... );
parser->setContentHandler(p);
parser->parser();
} catch (SAX ...) {
...
}
}
};
These classes were completely unnecessary, a single function works. Each ContentHandler class had been autogenerated with the same 7 or 8 variables, which I was able to share with inheritance.
So I was expecting the size of the code to go down when I removed the parser classes and all from the code. But with only 10 services, I wasn't expecting it to drop from 38Mb to 36Mb. That's an outrageous amount of symbols.
The only thing that I can think of is that each parser was including boost::shared_ptr, some Xerces parser stuff, and that somehow, the compiler and linker are storing all those symbols repeatedly for each file. I'm curious to find out in any case.
So, can anyone suggest how I would go about tracking down why a simple modification like this should have so much impact? I can use nm on a module to look at the symbols inside, but that's going to generate a painful, huge amount of semi-readable stuff.
Also, when a colleague ran her code with my new library, the user time went from 1m55 seconds to 1m25 seconds. The real time is highly variable, because we are waiting on slow SOAP servers (IMHO, SOAP is an incredibly poor replacement for CORBA...) but the CPU time is quite stable. I would have expected a slight boost from reducing the code size that much, but the bottom line is, on a server with massive memory, I was really surprised that the speed was impacted so much, considering I didn't change the architecture of the XML processing itself.
I'm going to take it much further on Tuesday, and hopefully will get more information, but if anyone has some idea of how I could get this much improvement, I'd love to know.
Update:
I verified that in fact, having debugging symbols in the task does not appear to change the run time at all. I did this by creating a header file that included lots of stuff, including the two that had the effect here: boost shared pointers and some of the xerces XML parser. There appears to be no runtime performance hit (I checked because there were differences of opinion between two answers). However, I also verified that including header files creates debugging symbols for each instance, even though the stripped binary size is unchanged. So if you include a given file, even if you don't even use it, there is a fixed number of symbols objected into that object that are not folded together at link time even though they are presumably identical.
My code looks like:
#include "includetorture.h"
void f1()
{
f2(); // call the function in the next file
}
The size with my particular include files was about 100k per source file. Presumably, if I had included more, it would be higher. The total executable with the includes was ~600k, without about 9k. I verified that the growth is linear with the number of files doing the including, but the stripped code is the same size regardless, as it should be.
Clearly I was mistaken thinking this was the reason for the performance gain. I think I have accounted for that now. Even though I didn't remove much code, I did streamline a lot of big xml string processing, and reduced the path through code considerably, and that is presumably the reason.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以在 Linux 上使用 readelf 实用程序,或在 Windows 上使用 dumpbin 来查找 exe 文件中各种数据所使用的确切空间量。不过,我不明白为什么可执行文件的大小会让您担心:调试符号在运行时绝对不使用内存!
You can use the readelf utility on linux, or dumpbin on windows, to find the exact amount of space used by various kinds of data in the exe file. Though, I don't see why the executable size is worrying you: debugging symbols use ABSOLUTELY NO memory at run-time!
看来您正在使用很多带有内联方法的 C++ 类。如果这些类具有很高的可见性,则此内联代码将使整个应用程序变得臃肿。我敢打赌您的链接时间也增加了。尝试减少内联方法的数量并将代码移至 .cpp 文件。这将减少目标文件、exe 文件的大小并减少链接时间。
这种情况下的权衡当然是减少编译单元的大小,而不是执行时间。
It seems you are using a lot of c++ classes with inline methods. If these classes have a high visibility, this inline code will bloat the whole application. I bet your link times have increased as well. Try reducing the number of inline methods and move the code to the .cpp files. This will reduce the size of your object files, the exe file and reduce link times.
The trade off in this case is of course reduced size of compilation units, versus execution time.
我没有得到你所期望的问题的答案,但让我分享一下我的经验。
可执行文件的大小差异很大是很常见的。我无法详细解释原因,但请想一想现代调试器允许您对代码执行的所有疯狂操作。要知道,这要归功于调试符号。
大小差异如此之大,以至于如果您动态加载一些共享库,那么文件的绝对加载时间可以解释您发现的性能差异。
事实上,这是编译器的一个相当“内部”的方面,举个例子,几年前我对 GCC-4 与 GCC-3 相比生成的巨大可执行文件非常不满意,然后我就习惯了它(我的硬盘尺寸也变大了)。
总而言之,我不介意,因为您应该仅在开发期间使用带有调试符号的构建,这不应该成为问题。在部署中,没有调试符号就在那里,您将看到文件会缩小多少。
I don't have the very answer you are expecting to your question, but let me share my experience.
It is pretty common that the difference in size of executable files is very high. I cannot explain why in detail, but just think of all the crazy things that modern debuggers let you do on your code. You know, this is thanks to debugging symbols.
The difference in size is so big that if you are, say, loading dynamically some shared libraries, then the sheer loading time of the file could explain the difference in performance you found.
Indeed, this is a pretty "internal" aspect of compilers, and just to give you an example, years back I was quite unhappy with the huge executable files that GCC-4 produced in comparison to GCC-3, then I simply got used to it (and my HD grew in size, also).
All in all, I would not mind, because you are supposed to use builds with debugging symbols only during development, where it should not be an issue. In deployment, no debugging symbol just be there, and you will see how much the files will shrink.