由于调试符号而产生巨大的可执行文件，为什么？

发布于 2024-11-10 05:51:18 字数 2007 浏览 1 评论 0原文

我们一直在为一家银行开发一个大型金融应用程序。一开始是 15 万行非常糟糕的代码。到 1 个月前，它已经减少到一半多一点，但可执行文件的大小仍然很大。我预计，由于我们只是使代码更具可读性，但模板化代码仍然生成大量目标代码，因此我们的努力更加高效。

该应用程序分为大约 5 个共享对象和一个 main.其中较大的共享对象之一为 40Mb，并且在代码缩小的情况下增长到了 50 个。

对于代码开始增长，我并不完全感到惊讶，因为毕竟我们正在添加一些功能。但令我惊讶的是，它增长了 20%。当然，没有人能接近编写 20% 的代码，所以我很难想象它是如何增长这么多的。该模块对我来说有点难以分析，但周五，我得到了一个新的数据点，可以提供一些线索。

可能有 10 个到 SOAP 服务器的提要。代码是自动生成的，很糟糕。每个服务都有一个具有完全相同代码的解析器类，类似于：

#include <boost/shared_ptr.hpp>
#include <xercesstuff...>
class ParserService1 {
public:
  void parse() {
    try {
      Service1ContentHandler*p = new Service1ContentHandler( ... );
      parser->setContentHandler(p);
      parser->parser();
    } catch (SAX ...) {
      ...
    }
  }
};

这些类完全不必要，单个函数就可以工作。每个 ContentHandler 类都是使用相同的 7 或 8 个变量自动生成的，我可以通过继承来共享这些变量。

因此，当我从代码中删除解析器类和所有内容时，我预计代码的大小会减小。但由于只有 10 个服务，我没想到它会从 38Mb 下降到 36Mb。符号数量多得令人发指。

我唯一能想到的是每个解析器都包含 boost::shared_ptr、一些 Xerces 解析器内容，并且不知何故，编译器和链接器为每个文件重复存储所有这些符号。无论如何我都很想知道。

那么，有人可以建议我如何去追踪为什么像这样的简单修改会产生如此大的影响吗？我可以在模块上使用 nm 来查看内部的符号，但这会生成大量令人痛苦的半可读内容。

此外，当一位同事使用我的新库运行她的代码时，用户时间从 1 分 55 秒缩短到 1 分 25 秒。实时时间变化很大，因为我们正在等待缓慢的 SOAP 服务器（恕我直言，SOAP 是 CORBA 的一个非常糟糕的替代品......），但 CPU 时间相当稳定。我本以为减少代码大小会带来轻微的提升，但最重要的是，在具有大量内存的服务器上，考虑到我没有改变代码的架构，我真的很惊讶速度受到如此大的影响。 XML 处理本身。

我将在周二进一步进行讨论，希望能获得更多信息，但如果有人知道我如何才能取得如此大的进步，我很想知道。

更新：我验证了事实上，在任务中使用调试符号似乎根本不会改变运行时间。我通过创建一个包含很多内容的头文件来做到这一点，其中包括在这里起作用的两个：boost 共享指针和一些 xerces XML 解析器。似乎没有运行时性能受到影响（我检查过，因为两个答案之间存在意见分歧）。但是，我还验证了包含头文件会为每个实例创建调试符号，即使剥离的二进制大小不变。因此，如果您包含给定的文件，即使您甚至不使用它，也会有固定数量的符号反对该对象，这些符号在链接时不会折叠在一起，即使它们可能是相同的。

我的代码如下所示：

#include "includetorture.h"
void f1()
{
    f2(); // call the function in the next file
}

我的特定包含文件的大小约为每个源文件 100k。据推测，如果我加入更多，它会更高。包含包含在内的可执行文件总数约为 600k，不含包含的可执行文件约为 9k。我验证了增长与执行包含操作的文件数量呈线性关系，但无论如何，剥离的代码的大小都是相同的，因为它应该是这样。

显然，我错误地认为这是性能提升的原因。我想我现在已经考虑到了这一点。尽管我没有删除太多代码，但我确实简化了许多大型 xml 字符串处理，并大大减少了代码路径，这可能就是原因。

原文

We have been developing a large financial application at a bank. It started out being 150k lines of really bad code. By 1 month ago, it was down to a little more than half that, but the size of the executable was still huge. I expected that as we were just making the code more readable, but the templated code was still generating plenty of object code, we were just being more efficient with our effort.

The application is broken into about 5 shared objects and a main. One of the bigger shared objects was 40Mb and grew to 50 even while the code shrank.

I wasn't entirely surprised that the code started to grow, because after all we are adding some functionality. But I was surprised that it grew by 20%. Certainly no one came close to writing 20% of the code, so it's hard for me to imagine how it grew that much. That module is kind of hard for me to analyze, but on Friday, I have a new datapoints that sheds some light.

There are perhaps 10 feeds to SOAP servers. The code is autogenerated, badly. Each service had one parser class with exactly the same code, something like:

#include <boost/shared_ptr.hpp>
#include <xercesstuff...>
class ParserService1 {
public:
  void parse() {
    try {
      Service1ContentHandler*p = new Service1ContentHandler( ... );
      parser->setContentHandler(p);
      parser->parser();
    } catch (SAX ...) {
      ...
    }
  }
};

These classes were completely unnecessary, a single function works. Each ContentHandler class had been autogenerated with the same 7 or 8 variables, which I was able to share with inheritance.

So I was expecting the size of the code to go down when I removed the parser classes and all from the code. But with only 10 services, I wasn't expecting it to drop from 38Mb to 36Mb. That's an outrageous amount of symbols.

The only thing that I can think of is that each parser was including boost::shared_ptr, some Xerces parser stuff, and that somehow, the compiler and linker are storing all those symbols repeatedly for each file. I'm curious to find out in any case.

So, can anyone suggest how I would go about tracking down why a simple modification like this should have so much impact? I can use nm on a module to look at the symbols inside, but that's going to generate a painful, huge amount of semi-readable stuff.

Also, when a colleague ran her code with my new library, the user time went from 1m55 seconds to 1m25 seconds. The real time is highly variable, because we are waiting on slow SOAP servers (IMHO, SOAP is an incredibly poor replacement for CORBA...) but the CPU time is quite stable. I would have expected a slight boost from reducing the code size that much, but the bottom line is, on a server with massive memory, I was really surprised that the speed was impacted so much, considering I didn't change the architecture of the XML processing itself.

I'm going to take it much further on Tuesday, and hopefully will get more information, but if anyone has some idea of how I could get this much improvement, I'd love to know.

Update:
I verified that in fact, having debugging symbols in the task does not appear to change the run time at all. I did this by creating a header file that included lots of stuff, including the two that had the effect here: boost shared pointers and some of the xerces XML parser. There appears to be no runtime performance hit (I checked because there were differences of opinion between two answers). However, I also verified that including header files creates debugging symbols for each instance, even though the stripped binary size is unchanged. So if you include a given file, even if you don't even use it, there is a fixed number of symbols objected into that object that are not folded together at link time even though they are presumably identical.

My code looks like:

#include "includetorture.h"
void f1()
{
    f2(); // call the function in the next file
}

The size with my particular include files was about 100k per source file. Presumably, if I had included more, it would be higher. The total executable with the includes was ~600k, without about 9k. I verified that the growth is linear with the number of files doing the including, but the stripped code is the same size regardless, as it should be.

Clearly I was mistaken thinking this was the reason for the performance gain. I think I have accounted for that now. Even though I didn't remove much code, I did streamline a lot of big xml string processing, and reduced the path through code considerably, and that is presumably the reason.

分享到QQ

分享到微博