需要帮助理解 C++ 的编译节目
我没有正确理解 C++ 程序的编译和链接。有没有办法,我可以查看编译 C++ 程序生成的目标文件(以可理解的格式)。这应该可以帮助我理解目标文件的格式、C++ 类是如何编译的、编译器需要哪些信息来生成目标文件,并帮助我理解以下语句:
如果一个类仅用作输入参数和返回类型,我们就不会这样做不需要包含整个类头文件。前向声明就足够了,但是如果派生类派生自基类,我们需要包含包含基类定义的文件(摘自“Exceptional C++”)。
我正在阅读《链接和加载》一书来了解目标文件的格式,但我更喜欢专门为 C++ 源代码定制的东西。
谢谢,
Jagrati
编辑:
我知道使用 nm 我可以查看目标文件中存在的符号,但我有兴趣了解有关目标文件的更多信息。
I don't properly understand compilation and linking of C++ programs. Is there a way, I can look at object files generated by compiling a C++ program(in an understandable format). This should help me understand format of object files, how C++ classes are compiled, what information is needed by compiler to generate object files and help me understand statements like:
if a class is used only as a input parameters and return type, we don't need to include the whole class header file. Forward declaration is enough, but if a derived class derives from base class, we need to include the file containing the definition of base class (Taken from "Exceptional C++").
I am reading the book "Linking and Loading" to understand format of object files, but I would prefer something specially tailored for C++ source code.
Thanks,
Jagrati
Edit:
I know that with nm I can look at symbols present in the object files, but I am interested in knowing more about the object files.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
首先,首先。反汇编编译器输出很可能不会以任何方式帮助您理解您遇到的任何问题。编译器的输出不再是 C++ 程序,而是普通的汇编,如果您不知道内存模型是什么,那么读起来真的很困难。
关于为什么在声明它是派生类的基类时需要定义基类的特定问题,有几个不同的原因(可能比我更重要)忘记):
this
指针的偏移量,并且该偏移量需要了解base
子对象所占用的大小。衍生
上下文中解析标识符并且在派生
类中找不到该标识符时,编译器必须知道它是否在基
中定义> 在封闭的命名空间中查找标识符之前。如果foo();
是在provider::function()
中声明的,则编译器无法知道foo();
是否是driven::function()
内的有效调用。代码>基类。派生
类时,必须知道base
中定义的所有虚函数的数量和签名。它需要这些信息来构建动态分派机制(通常是 vtable),甚至需要了解派生中的成员函数是否绑定为动态分派 --ifbase:: f()
是虚拟的,那么无论provided
中的声明是否具有virtual
,driven::f()
都将是虚拟的> 关键字。baseX
的相对偏移量,在调用方法的最终重写器之前必须重写(一个base2
类型的指针,指向multiplydriven
的对象并不指向实例的开头,而是指向实例中base2
子对象的开头,该子对象可能会被之前声明的其他基数偏移。继承列表中的 code>base2
到注释中的最后一个问题:
前面的代码在堆栈中分配
衍生
类型的对象。编译器将添加汇编指令来为堆栈中的对象保留一定量的内存。编译器解析并生成程序集后,没有对象的踪迹,特别是(假设 POD 类型有一个简单的构造函数:即没有初始化任何内容),该代码和 void f() { char array[ sizeof(派生)]; } 将产生完全相同的汇编程序。当编译器生成将保留空间的指令时,它需要知道多少。First things, first. Disassembling the compiler output will most probably not help you in any way to understand any of the issues you have. The output of the compiler is no longer a c++ program, but plain assembly and that is really harsh to read if you do not know what the memory model is.
On the particular issues of why is the definition of
base
required when you declare it to be a base class ofderived
there are a few different reasons (and probably more that I am forgetting):derived
is created, the compiler must reserve memory for the full instance and all subclasses: it must know the size ofbase
this
pointer, and that offset requires knowledge of the size taken by thebase
subobject.derived
and the identifier is not found inderived
class, the compiler must know whether it is defined inbase
before looking for the identifier in the enclosing namespaces. The compiler cannot know whetherfoo();
is a valid call insidederived::function()
iffoo()
is declared in thebase
class.base
must be known when the compiler defines thederived
class. It needs that information to build the dynamic dispatch mechanism --usually vtable--, and even to know whether a member function inderived
is bound for dynamic dispatch or not --ifbase::f()
is virtual, thenderived::f()
will be virtual regardless of whether the declaration inderived
has thevirtual
keyword.baseX
that must be rewritting before final overriders for the methods are called (a pointer of typebase2
that points to an object ofmultiplyderived
does not point to the beginning of the instance, but to the beginning of thebase2
subobject in the instance, which might be offsetted by other bases declared beforebase2
in the inheritance list.To the last question in the comments:
The previous code allocates and object of type
derived
in the stack. The compiler will add assembler instructions to reserve some amount of memory for the object in the stack. After the compiler has parsed and generated the assembly, there is no trace of the object, in particular (assuming a trivial constructor for a POD type: i.e. nothing is initialized), that code andvoid f() { char array[ sizeof(derived) ]; }
will produce exactly the same assembler. When the compiler generates the instruction that will reserve the space, it needs to know how much.您是否尝试过使用 readelf 检查二进制文件(前提是您使用的是 Linux 平台)?这提供了有关 ELF 目标文件的相当全面的信息。
但老实说,我不确定这对理解编译和链接有多大帮助。我认为正确的策略可能是掌握 C++ 代码如何映射到程序集预链接和后链接。
Have you tried inspecting your binaries with
readelf
(provided you're on a Linux platform)? This provides pretty comprehensive information on ELF object files.Honestly, though, I'm not sure how much this would help with understanding compilation and linking. I think the right tack is probably to get a handle on how C++ code maps to assembly pre- and post-linking.
您通常不需要详细了解 Obj 文件的内部格式,因为它们是为您生成的。您需要知道的是,对于您创建的每个类,编译器都会生成 Obj 文件,它是类的二进制字节代码,适合您正在编译的操作系统。然后下一步——链接——将程序所需的所有类的目标文件放在一个 EXE 或 DLL(或非 Windows 操作系统的任何其他格式)中。也可以是 EXE + 几个 DLL,具体取决于您的意愿。
最重要的是,将类的接口(声明)和实现(定义)分开。
始终仅将类的接口声明放入头文件中。没有别的 - 这里没有实现。还要避免具有自定义类型的成员变量(不是指针),因为对于它们来说,前向声明是不够的,您需要在标头中包含其他标头。如果你的标题中包含了这些内容,那么设计就会有味道,也会减慢构建过程。
类方法或其他函数的所有实现都应位于 CPP 文件中。这将保证当有人包含您的标头时,不需要由编译器生成的 Obj 文件,并且您只能在 CPP 文件中包含其他人的标头。
但为什么要麻烦呢?答案是,如果您有这样的分离,那么链接速度会更快,因为每个 Obj 文件每个类都会使用一次。另外,如果您更改类,这也会在下一次构建期间更改少量其他对象文件。
如果头文件中包含了包含文件,这意味着当编译器为您的类生成 Obj 文件时,它应该首先为头文件中包含的其他类生成 Obj 文件,这可能再次需要其他 Obj 文件,依此类推。甚至可能是循环依赖,然后就无法编译!或者,如果您更改类中的某些内容,那么编译器将需要重新生成许多其他 Obj 文件,因为如果您不分开,一段时间后它们会变得非常紧密的依赖关系。
You normally don't need to know in details the internal format of the Obj files, since they are generated for you. All you need to know is that for every class you create, the compiler generates and Obj file, which is the binary byte code of your class, suited for the OS you are compiling for. Then the next step - linking - will put together the object files for all classes you need for your program in a single EXE or DLL (or whatever other format for the non-Windows OS-es). Could be also EXE + several DLLs, depending on your wishes.
The most important is that you separate the interface (declaration) and implementation (definition) of your class.
Always put in the header file interface declarations of your class only. Nothing else - no implementations here. Avoid also member variables, with custom types, which are not pointers, because for them forward declarations are not enough and you need to include other headers in your header. If you have includes in your header, then the design smells and also slows down the building process.
All implementations of the class methods or other functions should be in the CPP file. This will guarantee that the Obj file, generated by the compiler, won't be needed when somebody includes your header and you can have includes from others in the CPP files only.
But why bother? The answer is that if you have such separations, then the Linking is faster, because each of your Obj files is used once per class. Also, if you change your class, this will change also a small amount of other object files during the next build.
If you have includes in the header, this means that when the compiler generates the Obj file for your class it should first generate Obj file for the other classes included in your header, which may require again other Obj files and so on. Could be even a circular dependency and then you can not compile! Or if you change something in your class, then the compiler will need to regenerate a lot of other Obj files, because they become very tight dependent after some time, if you don't separate.
nm
是一个 UNIX 工具,它将显示目标文件中符号的名称。objdump
是一个 GNU 工具,它将向您显示更多信息。但这两种工具都会向您显示链接器使用的相当原始的信息,但不适合人类阅读。这可能不会帮助您更好地理解 C++ 级别发生的情况。
nm
is a unix tool which will show you the names of the symbols in an object file.objdump
is a GNU tool which will show you more information.But both tools will show you quite raw information that is used by the linker, but not designed to be read by human beings. That will probably not help you to better understand what happen at the C++ level.
我正在阅读“http://www.network-theory.co.uk/docs/ gccintro/" - “GCC 简介”。这让我对链接和编译有了很好的了解。它处于初学者水平,但我不在乎。
Im reading "http://www.network-theory.co.uk/docs/gccintro/" - "Introduction to GCC". This has given me a good insight in linking and compiling. Its on a beginners level, but I dont care.