当前位置：文江博客话题详情

需要帮助理解 C++ 的编译节目

发布于 2024-09-08 22:37:53 字数 379 浏览 14 评论 0原文

我没有正确理解 C++ 程序的编译和链接。有没有办法，我可以查看编译 C++ 程序生成的目标文件（以可理解的格式）。这应该可以帮助我理解目标文件的格式、C++ 类是如何编译的、编译器需要哪些信息来生成目标文件，并帮助我理解以下语句：

如果一个类仅用作输入参数和返回类型，我们就不会这样做不需要包含整个类头文件。前向声明就足够了，但是如果派生类派生自基类，我们需要包含包含基类定义的文件（摘自“Exceptional C++”）。

我正在阅读《链接和加载》一书来了解目标文件的格式，但我更喜欢专门为 C++ 源代码定制的东西。

谢谢，

Jagrati

编辑：

我知道使用 nm 我可以查看目标文件中存在的符号，但我有兴趣了解有关目标文件的更多信息。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

掀纱窥君容 2024-09-15 22:37:53

首先，首先。反汇编编译器输出很可能不会以任何方式帮助您理解您遇到的任何问题。编译器的输出不再是 C++ 程序，而是普通的汇编，如果您不知道内存模型是什么，那么读起来真的很困难。

关于为什么在声明它是派生类的基类时需要定义基类的特定问题，有几个不同的原因（可能比我更重要）忘记）：

当创建派生类型的对象时，编译器必须为完整实例和所有子类保留内存：它必须知道基类的大小
。成员属性编译器必须知道隐式 this 指针的偏移量，并且该偏移量需要了解 base 子对象所占用的大小。
当在衍生上下文中解析标识符并且在派生类中找不到该标识符时，编译器必须知道它是否在基中定义> 在封闭的命名空间中查找标识符之前。如果 foo(); 是在 provider::function() 中声明的，则编译器无法知道 foo(); 是否是 driven::function() 内的有效调用。代码>基类。
当编译器定义派生类时，必须知道base中定义的所有虚函数的数量和签名。它需要这些信息来构建动态分派机制（通常是 vtable），甚至需要了解派生中的成员函数是否绑定为动态分派 --if base:: f() 是虚拟的，那么无论 provided 中的声明是否具有 virtual，driven::f() 都将是虚拟的> 关键字。
多重继承增加了一些其他要求，例如每个 baseX 的相对偏移量，在调用方法的最终重写器之前必须重写（一个 base2 类型的指针，指向multiplydriven 的对象并不指向实例的开头，而是指向实例中 base2 子对象的开头，该子对象可能会被 之前声明的其他基数偏移。继承列表中的 code>base2

到注释中的最后一个问题：

因此，对象的实例化（全局对象除外）是否可以等到运行时，因此大小和偏移量等可以等到链接时间，我们不必在生成对象时处理它文件？

void f() {
   derived d;
   //...
}

前面的代码在堆栈中分配衍生类型的对象。编译器将添加汇编指令来为堆栈中的对象保留一定量的内存。编译器解析并生成程序集后，没有对象的踪迹，特别是（假设 POD 类型有一个简单的构造函数：即没有初始化任何内容），该代码和 void f() { char array[ sizeof(派生)]; } 将产生完全相同的汇编程序。当编译器生成将保留空间的指令时，它需要知道多少。

First things, first. Disassembling the compiler output will most probably not help you in any way to understand any of the issues you have. The output of the compiler is no longer a c++ program, but plain assembly and that is really harsh to read if you do not know what the memory model is.

On the particular issues of why is the definition of base required when you declare it to be a base class of derived there are a few different reasons (and probably more that I am forgetting):

When an object of type derived is created, the compiler must reserve memory for the full instance and all subclasses: it must know the size of base
When you access a member attribute the compiler must know the offset from the implicit this pointer, and that offset requires knowledge of the size taken by the base subobject.
When an identifier is parsed in the context of derived and the identifier is not found in derived class, the compiler must know whether it is defined in base before looking for the identifier in the enclosing namespaces. The compiler cannot know whether foo(); is a valid call inside derived::function() if foo() is declared in the base class.
The number and signatures of all virtual functions defined in base must be known when the compiler defines the derived class. It needs that information to build the dynamic dispatch mechanism --usually vtable--, and even to know whether a member function in derived is bound for dynamic dispatch or not --if base::f() is virtual, then derived::f() will be virtual regardless of whether the declaration in derived has the virtual keyword.
Multiple inheritance adds a few other requirements --like relative offsets from each baseX that must be rewritting before final overriders for the methods are called (a pointer of type base2 that points to an object of multiplyderived does not point to the beginning of the instance, but to the beginning of the base2 subobject in the instance, which might be offsetted by other bases declared before base2 in the inheritance list.

To the last question in the comments:

So doesn't instantiation of objects (except for global ones) can wait until runtime and thus the size and offset etc could wait until link time and we shouldn't necessarily have to deal with it at the time we are generating object files?

void f() {
   derived d;
   //...
}

The previous code allocates and object of type derived in the stack. The compiler will add assembler instructions to reserve some amount of memory for the object in the stack. After the compiler has parsed and generated the assembly, there is no trace of the object, in particular (assuming a trivial constructor for a POD type: i.e. nothing is initialized), that code and void f() { char array[ sizeof(derived) ]; } will produce exactly the same assembler. When the compiler generates the instruction that will reserve the space, it needs to know how much.

回复收藏 0 原文

我一直都在从未离去 2024-09-15 22:37:53

您是否尝试过使用 readelf 检查二进制文件（前提是您使用的是 Linux 平台）？这提供了有关 ELF 目标文件的相当全面的信息。

但老实说，我不确定这对理解编译和链接有多大帮助。我认为正确的策略可能是掌握 C++ 代码如何映射到程序集预链接和后链接。

回复收藏 0 原文

空心空情空意 2024-09-15 22:37:53

您通常不需要详细了解 Obj 文件的内部格式，因为它们是为您生成的。您需要知道的是，对于您创建的每个类，编译器都会生成 Obj 文件，它是类的二进制字节代码，适合您正在编译的操作系统。然后下一步——链接——将程序所需的所有类的目标文件放在一个 EXE 或 DLL（或非 Windows 操作系统的任何其他格式）中。也可以是 EXE + 几个 DLL，具体取决于您的意愿。

最重要的是，将类的接口（声明）和实现（定义）分开。

始终仅将类的接口声明放入头文件中。没有别的 - 这里没有实现。还要避免具有自定义类型的成员变量（不是指针），因为对于它们来说，前向声明是不够的，您需要在标头中包含其他标头。如果你的标题中包含了这些内容，那么设计就会有味道，也会减慢构建过程。

类方法或其他函数的所有实现都应位于 CPP 文件中。这将保证当有人包含您的标头时，不需要由编译器生成的 Obj 文件，并且您只能在 CPP 文件中包含其他人的标头。

但为什么要麻烦呢？答案是，如果您有这样的分离，那么链接速度会更快，因为每个 Obj 文件每个类都会使用一次。另外，如果您更改类，这也会在下一次构建期间更改少量其他对象文件。

如果头文件中包含了包含文件，这意味着当编译器为您的类生成 Obj 文件时，它应该首先为头文件中包含的其他类生成 Obj 文件，这可能再次需要其他 Obj 文件，依此类推。甚至可能是循环依赖，然后就无法编译！或者，如果您更改类中的某些内容，那么编译器将需要重新生成许多其他 Obj 文件，因为如果您不分开，一段时间后它们会变得非常紧密的依赖关系。

You normally don't need to know in details the internal format of the Obj files, since they are generated for you. All you need to know is that for every class you create, the compiler generates and Obj file, which is the binary byte code of your class, suited for the OS you are compiling for. Then the next step - linking - will put together the object files for all classes you need for your program in a single EXE or DLL (or whatever other format for the non-Windows OS-es). Could be also EXE + several DLLs, depending on your wishes.

The most important is that you separate the interface (declaration) and implementation (definition) of your class.

Always put in the header file interface declarations of your class only. Nothing else - no implementations here. Avoid also member variables, with custom types, which are not pointers, because for them forward declarations are not enough and you need to include other headers in your header. If you have includes in your header, then the design smells and also slows down the building process.

All implementations of the class methods or other functions should be in the CPP file. This will guarantee that the Obj file, generated by the compiler, won't be needed when somebody includes your header and you can have includes from others in the CPP files only.

But why bother? The answer is that if you have such separations, then the Linking is faster, because each of your Obj files is used once per class. Also, if you change your class, this will change also a small amount of other object files during the next build.

If you have includes in the header, this means that when the compiler generates the Obj file for your class it should first generate Obj file for the other classes included in your header, which may require again other Obj files and so on. Could be even a circular dependency and then you can not compile! Or if you change something in your class, then the compiler will need to regenerate a lot of other Obj files, because they become very tight dependent after some time, if you don't separate.

回复收藏 0 原文