编译后的 C++ 有何作用?类是什么样子的?

发布于 2024-09-09 02:59:39 字数 346 浏览 3 评论 0原文

有了一些汇编指令和 C 程序的背景,我可以想象编译后的函数是什么样子,但有趣的是我从未如此仔细地考虑过编译后的 C++ 类会是什么样子。

bash$ cat class.cpp
#include<iostream>
class Base
{
  int i;
  float f;
};

bash$ g++ -c class.cpp

我跑了:

bash$objdump -d class.o
bash$readelf -a class.o

但是我得到的却是我很难理解的。

有人可以解释一下我或建议一些好的起点吗?

With some background in assemble instructions and C programs, I can visualize how a compiled function would look like, but it's funny I have never so carefully thought about how a compiled C++ class would look like.

bash$ cat class.cpp
#include<iostream>
class Base
{
  int i;
  float f;
};

bash$ g++ -c class.cpp

I ran:

bash$objdump -d class.o
bash$readelf -a class.o

but what I get is hard for me to understand.

Could somebody please explain me or suggest some good starting points.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

魔法唧唧 2024-09-16 02:59:39

这些类(或多或少)被构造为常规结构。这些方法(或多或少......)转换为第一个参数是“this”的函数。对类变量的引用是作为“this”的偏移量完成的。

至于继承,让我们引用 C++ FAQ LITE,它在这里镜像 http://www.parashift.com/c++-faq-lite/virtual-functions.html#faq-20.4。本章展示了如何在真实硬件中调用虚函数(编译后会生成什么机器码。


让我们举个例子。假设类 Base 有 5 个虚函数:virt0()virt4 。

 // Your original C++ source code
 class Base {
 public:
   virtual arbitrary_return_type virt0(...arbitrary params...);
   virtual arbitrary_return_type virt1(...arbitrary params...);
   virtual arbitrary_return_type virt2(...arbitrary params...);
   virtual arbitrary_return_type virt3(...arbitrary params...);
   virtual arbitrary_return_type virt4(...arbitrary params...);
   ...
 };

步骤#1:编译器构建一个包含 5 个函数指针的静态表,将该表埋入静态内存中的某个位置。许多(不是全部)编译器在编译时定义了该表 定义 Base 的第一个非内联虚函数的 .cpp 我们将该表称为 v-table;如果函数指针适合一个机器字。目标硬件平台,Base::__vtable 最终将消耗 5 个隐藏字的内存,每个实例不是 5 个,每个函数不是 5 个;它可能类似于以下伪代码:

 // Pseudo-code (not C++, not C) for a static table defined within file Base.cpp

 // Pretend FunctionPtr is a generic pointer to a generic member function
 // (Remember: this is pseudo-code, not C++ code)
 FunctionPtr Base::__vtable[5] = {
   &Base::virt0, &Base::virt1, &Base::virt2, &Base::virt3, &Base::virt4
 };

< strong>步骤#2:编译器向 Base 类的每个对象添加一个隐藏指针(通常也是机器字),这称为 v 指针。将此隐藏指针视为隐藏数据成员,就好像编译器将您的类重写为如下所示:

 // Your original C++ source code
 class Base {
 public:
   ...
   FunctionPtr* __vptr;  ← supplied by the compiler, hidden from the programmer
   ...
 };

步骤#3:编译器在其中初始化 this->__vptr每个构造函数。这个想法是让每个对象的 v 指针指向其类的 v 表,就好像它在每个构造函数的 init 列表中添加以下指令一样:

 Base::Base(...arbitrary params...)
   : __vptr(&Base::__vtable[0])  ← supplied by the compiler, hidden from the programmer
   ...
 {
   ...
 }

现在让我们计算一个派生类。假设您的 C++ 代码定义了从类 Base 继承的类 Der。编译器重复步骤 #1 和 #3(但不重复步骤 #2)。在步骤 #1 中,编译器创建一个隐藏的 v 表,保留与 Base::__vtable 中相同的函数指针,但替换与覆盖相对应的那些槽。例如,如果 Der 通过 virt2() 覆盖 virt0() 并按原样继承其他项,则 Der 的 v 表可能看起来像这样(假装 Der 不这样做)添加任何新的虚函数):

 // Pseudo-code (not C++, not C) for a static table defined within file Der.cpp

 // Pretend FunctionPtr is a generic pointer to a generic member function
 // (Remember: this is pseudo-code, not C++ code)
 FunctionPtr Der::__vtable[5] = {
   &Der::virt0, &Der::virt1, &Der::virt2, &Base::virt3, &Base::virt4
 };                                        ^^^^----------^^^^---inherited as-is

在步骤 #3 中,编译器在每个 Der 构造函数的开头添加类似的指针赋值。这个想法是更改每个 Der 对象的 v 指针,使其指向其类的 v 表。 (这不是第二个 v 指针;它与基类 Base 中定义的 v 指针相同;记住,编译器不会在类 Der 中重复步骤 #2。)

最后,让我们看看编译器如何实现调用虚函数。您的代码可能如下所示:

 // Your original C++ code
 void mycode(Base* p)
 {
   p->virt3();
 }

编译器不知道这是否会调用 Base::virt3()Der::virt3() 或者可能是virt3() 方法。它只确定您正在调用 virt3() ,而该函数恰好是 v 表的槽 #3 中的函数。它将调用重写为如下内容:

 // Pseudo-code that the compiler generates from your C++

 void mycode(Base* p)
 {
   p->__vptr[3](p);
 } 

我强烈建议每个 C++ 开发人员阅读常见问题解答。这可能需要几周的时间(因为它很难阅读而且很长),但它会教你很多关于 C++ 的知识以及可以用它做什么。

The classes are (more or less) constructed as regular structs. The methods are (more or less...) converted into functions which first parameter is "this". References to the class variables are done as an offset to "this".

As far as inheritance, lets quote from the C++ FAQ LITE, which is mirrored here http://www.parashift.com/c++-faq-lite/virtual-functions.html#faq-20.4 . This chapter shows how Virtual functions are called in the real hardware (what does the compile make in machine code.


Let's work an example. Suppose class Base has 5 virtual functions: virt0() through virt4().

 // Your original C++ source code
 class Base {
 public:
   virtual arbitrary_return_type virt0(...arbitrary params...);
   virtual arbitrary_return_type virt1(...arbitrary params...);
   virtual arbitrary_return_type virt2(...arbitrary params...);
   virtual arbitrary_return_type virt3(...arbitrary params...);
   virtual arbitrary_return_type virt4(...arbitrary params...);
   ...
 };

Step #1: the compiler builds a static table containing 5 function-pointers, burying that table into static memory somewhere. Many (not all) compilers define this table while compiling the .cpp that defines Base's first non-inline virtual function. We call that table the v-table; let's pretend its technical name is Base::__vtable. If a function pointer fits into one machine word on the target hardware platform, Base::__vtable will end up consuming 5 hidden words of memory. Not 5 per instance, not 5 per function; just 5. It might look something like the following pseudo-code:

 // Pseudo-code (not C++, not C) for a static table defined within file Base.cpp

 // Pretend FunctionPtr is a generic pointer to a generic member function
 // (Remember: this is pseudo-code, not C++ code)
 FunctionPtr Base::__vtable[5] = {
   &Base::virt0, &Base::virt1, &Base::virt2, &Base::virt3, &Base::virt4
 };

Step #2: the compiler adds a hidden pointer (typically also a machine-word) to each object of class Base. This is called the v-pointer. Think of this hidden pointer as a hidden data member, as if the compiler rewrites your class to something like this:

 // Your original C++ source code
 class Base {
 public:
   ...
   FunctionPtr* __vptr;  ← supplied by the compiler, hidden from the programmer
   ...
 };

Step #3: the compiler initializes this->__vptr within each constructor. The idea is to cause each object's v-pointer to point at its class's v-table, as if it adds the following instruction in each constructor's init-list:

 Base::Base(...arbitrary params...)
   : __vptr(&Base::__vtable[0])  ← supplied by the compiler, hidden from the programmer
   ...
 {
   ...
 }

Now let's work out a derived class. Suppose your C++ code defines class Der that inherits from class Base. The compiler repeats steps #1 and #3 (but not #2). In step #1, the compiler creates a hidden v-table, keeping the same function-pointers as in Base::__vtable but replacing those slots that correspond to overrides. For instance, if Der overrides virt0() through virt2() and inherits the others as-is, Der's v-table might look something like this (pretend Der doesn't add any new virtuals):

 // Pseudo-code (not C++, not C) for a static table defined within file Der.cpp

 // Pretend FunctionPtr is a generic pointer to a generic member function
 // (Remember: this is pseudo-code, not C++ code)
 FunctionPtr Der::__vtable[5] = {
   &Der::virt0, &Der::virt1, &Der::virt2, &Base::virt3, &Base::virt4
 };                                        ^^^^----------^^^^---inherited as-is

In step #3, the compiler adds a similar pointer-assignment at the beginning of each of Der's constructors. The idea is to change each Der object's v-pointer so it points at its class's v-table. (This is not a second v-pointer; it's the same v-pointer that was defined in the base class, Base; remember, the compiler does not repeat step #2 in class Der.)

Finally, let's see how the compiler implements a call to a virtual function. Your code might look like this:

 // Your original C++ code
 void mycode(Base* p)
 {
   p->virt3();
 }

The compiler has no idea whether this is going to call Base::virt3() or Der::virt3() or perhaps the virt3() method of another derived class that doesn't even exist yet. It only knows for sure that you are calling virt3() which happens to be the function in slot #3 of the v-table. It rewrites that call into something like this:

 // Pseudo-code that the compiler generates from your C++

 void mycode(Base* p)
 {
   p->__vptr[3](p);
 } 

I strongly recommend every C++ developer to read the FAQ. It might take several weeks (as it's hard to read and long) but it will teach you a lot about C++ and what can be done with it.

酒几许 2024-09-16 02:59:39

好的。编译类没有什么特别的。编译的类甚至不存在。存在的对象是平坦的内存块,并且字段之间可能有填充?代码中某处的独立成员函数将指向对象的指针作为第一个参数。

所以 Base 类的对象应该是

(*base_address) : i
(*base_address + sizeof(int)) :f

字段之间可以有填充吗?但这是特定于硬件的。基于处理器内存模型。

另外...在调试版本中,可以捕获调试符号中的类描述。但这是编译器特定的。您应该搜索一个为您的编译器转储调试符号的程序。

ok. there is nothing special with compiled classes. compiled classes even does not exists. what exist is objects wich are flat chunk of memory with possible paddings between fields? and standalone member functions somewhere in code which take pointer to an object as first parameter.

so object of class Base should be something

(*base_address) : i
(*base_address + sizeof(int)) : f

it is possible to have paddings between fields? but that is hardware specific. based on processors memory model.

also... in debug version it is possible to catch class description in debug symbols. but that is compiler specific. you should search for a program which dumps debug symbols for your compiler.

徒留西风 2024-09-16 02:59:39

“编译类”的意思是“编译方法”。

方法是一个带有额外参数的普通函数,通常放在寄存器中(我相信主要是 %ecx,这至少对于大多数必须使用 __thiscall 约定生成 COM 对象的 Windows 编译器来说是这样)。

因此,C++ 类与一堆普通函数没有太大区别,除了名称修改和构造函数/析构函数中用于设置 vtable 的一些魔法之外。

"Compiled classes" mean "compiled methods".

A method is an ordinary function with an extra parameter, usually put in a register (mostly %ecx I believe, this is at least true for most Windows compilers who have to produce COM objects using __thiscall convention).

So C++ classes are not terribly different from a bunch of ordinary functions, except for name mangling and some magic in constructors/destructors for setting up vtables.

伴我心暖 2024-09-16 02:59:39

与读取 C 对象文件的主要区别在于 C++ 方法名称是损坏的。您可以尝试将选项-C|--demangleobjdump一起使用。

The main difference from reading C object files is that the C++ method names are mangled. You may try to use option -C|--demangle with objdump.

梦萦几度 2024-09-16 02:59:39

尝试一下

g++ -S 类.cpp

这将为您提供一个汇编文件“class.s”(文本文件),您可以使用文本编辑器读取该文件。
但是,您的代码不会执行任何操作(声明类不会自行生成代码),因此汇编文件中不会有太多内容。

Try the

g++ -S class.cpp

That will give you an assembly file 'class.s' (text file) which you can read with a text editor.
However, your code doesn't do anything (declaring a class doesn't generate code on its own) so you won't have much in the assembly file.

驱逐舰岛风号 2024-09-16 02:59:39

就像一个 C 结构体和一组带有附加参数(指向该结构体的指针)的函数。

遵循编译器所做操作的最简单方法可能是在不进行优化的情况下进行构建,然后将代码加载到调试器中并使用混合源/汇编器模式逐步执行它。

然而,编译器的要点是您不需要了解这些东西(除非您正在编写编译器)。

Like a C struct and a set of functions with an additional parameter that is a pointer to the struct.

The easiest way to follow what the compiler did perhaps is to build without optimisation, then load the code into a debugger and step through it in with mixed source/assembler mode.

However, the point of the compiler is that you don't need to know this stuff (unless perhaps you are writing a compiler).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文