当前位置：文江博客话题详情

为什么生成的二进制文件这么大？

发布于 2024-11-05 20:38:12 字数 92 浏览 12 评论 0原文

为什么编译 C++ 程序时生成的二进制文件如此之大（轻松达到源代码文件大小的 10 倍）？与不需要这种编译的解释语言相比（因此程序大小只是代码文件的大小），这有什么优点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

厌倦 2024-11-12 20:38:12

现代解释语言通常会将代码编译为某种表示方式以加快执行速度……它可能不会写入磁盘，但肯定不能保证程序以更紧凑的形式表示。有些解释器会全力以赴地生成机器代码（例如Java JIT）。然后，解释器本身就位于内存中，该内存可能很大。

几点：

源代码中的命令越复杂，执行它们可能需要更多的机器代码操作。因此，高级语言功能往往具有较高的编译代码与源代码的比率。这不一定是坏事：将其视为“我只需说一点我想要做的事情，它就推断出所有这些必要的步骤”。编程中的挑战是确保它们是必要的 - 这需要良好的库和程序设计。
编译器经常故意决定用一些可执行文件大小来换取更快的预期执行速度：内联代码与外联代码是这种妥协的一部分，尽管对于小函数来说，两者都可能不会始终更紧凑。
更复杂的运行时环境（例如添加对 C++ 异常的支持）可能涉及一些额外的代码，这些代码在程序首次开始为该语言功能构建必要的环境时运行。
库的功能可能没有可比性。除了您很可能必须自行查找并非常注意使用的附加库（例如 XML、PDF 解析、OpenGL）之外，语言通常会悄悄地使用支持库来实现看似语言功能的功能和功能。其中任何一个都可能大得惊人。
- 例如，许多解释器仅公开 C 库的 printf() 语句或类似内容，而对于输出格式化，C++ 有 ostream - 一种更复杂、可扩展的类型- 安全系统，具有（无论好坏）跨函数调用的持久状态、查询和设置该状态的例程、可定制缓冲的附加层、可定制字符类型和本地化，以及通常许多可以导致更小的内联函数或更大的程序取决于确切的用途和编译器设置。什么是最好的取决于您的应用程序和内存与性能目标。
内置语言语句的编译方式可能有所不同：整数表达式上的 switch 并在 1 到 1000 之间随机分布 100 个 case 标签：一个编译器/语言可能决定“打包”100 个 case 并执行二进制操作搜索匹配项，另一个使用由 1000 个元素组成的稀疏数组并进行直接索引（这会浪费可执行文件中的空间，但通常会提高代码速度）。因此，很难根据可执行文件的大小得出结论。

通常，随着程序变得更大、更复杂，内存使用和执行速度变得越来越重要。您不会看到用解释语言编写的操作系统、企业 Web 服务器或全功能商业文字处理器等系统，因为它们不具有可扩展性。

Modern interpreted languages do typically compile the code to some manner of representation for faster execution... it might not get written out to disk, but there's certainly no guarantee that the program is represented in a more compact form. Some interpreters go the whole hog and generate machine code anyway (e.g. Java JIT). Then there's the interpreter itself sitting in memory which can be large.

A few points:

The more sophisticated the commands in the source code, the more machine code operations might be required to execute them. Thus, higher level language features tend to have a higher ratio of compiled-code to source code. That's not necessarily a bad thing: think of it as "I only have to say a little about what I want done and it infers all those necessary steps". The challenge in programming is to ensure they are necessary - that requires good library and program design.
The compiler often deliberately decides to trade some executable size for faster expected execution speed: inline vs out-of-line code is part of this compromise, though for small functions neither may be consistently more compact.
More sophisticated run-time environments (e.g. adding support for C++ exceptions) can involve a bit of extra code that runs when the program first starts to construct the necessary environment for that language feature.
Libraries feature may not be comparable. As well as the sort of add-on libraries you're very likely to have had to track down yourself and be very aware of using (e.g. XML, PDF parsing, OpenGL), languages often quietly use supporting libraries for what seem like language features and functions. Any of these can be suprisingly large.
- For example, many interpreters just expose the C library's printf() statement or something similar, while for output formatting C++ has ostream - a more complex, extensible and type-safe system with (for better or worse) persistent state across function calls, routines to query and set that state, an additional layer of customisable buffering, customisable character types and localisation, and generally a lot of small inline functions that can lead to smaller or larger programs depending on the exact use and compiler settings. What's best depends on your application and memory vs performance goals.
Inbuilt language statements may be compiled differently: a switch on an integer expression and have 100 case labels spread randomly between 1 and 1000: one compiler/languages might decide to "pack" the 100 cases and do a binary search for a match, another to use a sparsely populated array of 1000 elements and do direct indexing (which wastes space in the executable but typically makes for faster code). So, it's hard to draw conclusions based on executable size.

Typically, memory usage and execution speed become increasingly important as the program gets larger and more complex. You don't see systems like Operating Systems, enterprise web servers or full-featured commercial word processors written in interpreted languages because they don't have the scalability.

回复收藏 0 原文

拔了角的鹿 2024-11-12 20:38:12

解释型语言假设解释器可用，而编译的程序在大多数情况下是独立的。

回复收藏 0 原文

没︽人懂的悲伤 2024-11-12 20:38:12

举一个简单的例子：假设你有一个单行程序，

print("hello world")

“打印”的作用是什么？当然很明显你要求其他代码来做一些工作吗？而且该代码不是免费的，需要运行的代码总数远远超过您编写的代码行数。在更现实的程序中，您可以利用许多复杂的库来管理窗口和其他 UI 功能、网络、数据库等。现在，无论该代码是捆绑到您的应用程序中还是从 DLL 加载，或者存在于解释器中，它都必须位于某个位置。

编译和解释以及中间解决方案（例如 Java 的编译/字节码解释方法）之间存在大量权衡。例如，您可能会考虑

每次运行时解释源代码与运行编译代码的运行时成本
解释器的可移植性优势 - 您需要为不同平台编译应用程序的单独版本。

Take a trivial case: Suppose you have a one line program

print("hello world")

what does that "print" do? Surely it's clear that your asking some other code to do some work? And that code isn't free, the sum total of what needs to run is much more than the lines of code you write. In more realistic programs you exploit many sophisticated libraries managing windows and other UI features, networks, databases and so on. Now whether that code is bundled into your application or loaded from DLLs or is present in the interpreter it's got to be somewhere.

There are plenty of trades-off between compilation and interpretation, and intermediate solutions such as Java's compilation/byte-code interpreatation approach. For example, you might consider

the run-time cost of interpreting the source every time you run versus running the compiled code
the portability advantages of interpreters - you need to compile separate versions of an app for different platforms.

回复收藏 0 原文