Typedefs,(二进制)代码重复和目标文件
假设我编译一个包含这段代码的源文件文件,
struct Point
{
int x;
int y;
};
struct Size
{
int x;
int y;
};
由于 Point
和 Size
完全相同(就其成员的内存布局而言),编译器会生成目标文件中存在重复代码(每个struct
一个)?这是我的第一个问题。
现在,让我们从源代码中删除struct Size
,并使用typedef
来定义它,就像这样,
typedef Point Size;
编译现在会做什么?它会重复代码吗(因为 typedef 不仅仅是重命名,而且还不止于此)?
现在假设我们有一个这样的类模板:
template <int UnUsed>
class ConcreteError : public BaseError {
public:
ConcreteError () :BaseError(), error_msg() {}
ConcreteError (int errorCode, int osErrorCode, const std::string& errorMessage)
:BaseError(errorCode, osErrorCode, errorMessage){}
};
然后我们设置了一些定义,如下所示,
typedef ConcreteError<0> FileError;
typedef ConcreteError<1> NetworkError;
typedef ConcreteError<2> DatabaseError;
由于模板参数 int UnUsed
没有在类的实现中使用(只是假设),所以看起来这种情况与多个类具有完全相同的内存布局完全相同(类似于 struct Point 和 struct Size 的情况),目标文件中是否会有重复的代码?
如果我们这样做怎么办?
typedef ConcreteError<0> FileError;
typedef ConcreteError<0> NetworkError;
typedef ConcreteError<0> DatabaseError;
这种情况是否更好,因为现在我们在 typedef 中使用相同实例化类?
PS:此类模板代码取自此处:
实际上,我不知道编译器如何从源代码生成目标文件,以及如何它处理类名、成员、其他符号等等。它如何处理 typedef?它有什么作用,
typedef int ArrayInt[100];
ArrayInt
是这里的新类型吗?编译器在目标文件中为其创建什么代码? 100
存储在哪里?
Suppose I compile a source file file which contains this piece of code,
struct Point
{
int x;
int y;
};
struct Size
{
int x;
int y;
};
Since Point
and Size
is exactly same (in terms of memory layout of it's members), would the compiler generate duplicate code (one for each struct
) in the object file? That is my first question.
Now, lets remove the struct Size
from the source code, and define it using typedef
instead, like this,
typedef Point Size;
What would the compile do now? Would it duplicate code (since typedef isn't just renaming, rather it's more than that)?
Now suppose we have a class template like this:
template <int UnUsed>
class ConcreteError : public BaseError {
public:
ConcreteError () :BaseError(), error_msg() {}
ConcreteError (int errorCode, int osErrorCode, const std::string& errorMessage)
:BaseError(errorCode, osErrorCode, errorMessage){}
};
And then we setup few definitions, like this,
typedef ConcreteError<0> FileError;
typedef ConcreteError<1> NetworkError;
typedef ConcreteError<2> DatabaseError;
Since the template parameter int UnUsed
is not used in the implementation of class (just suppose that), so it seems that this situation is exactly same as multiple classes having exactly same memory layout (similar to the case of struct Point
and struct Size
), would there be duplicate code in the object file?
And what if we do like this,
typedef ConcreteError<0> FileError;
typedef ConcreteError<0> NetworkError;
typedef ConcreteError<0> DatabaseError;
Is this situation better, since now we're using same instantiated class in the typedefs?
PS: this class template code is taken from here :
How to create derived classes from a base class using template programming in C++?
Actually, I don't have any idea how compiler generates object file from source code, and how it handles class names, it's members, other symbols and all. How it handles typedefs? What does it do with this,
typedef int ArrayInt[100];
Is ArrayInt
a new type here? What code compiler creates for it in the object file? Where is 100
stored?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您的示例中的任何一行都不会在目标文件中生成任何代码。或者,更准确地说,它根本不会生成任何数据。我认为“代码”只是指处理器指令。
目标文件中的数据分为三段:代码、静态数据和常量数据。
代码由实际函数定义(带有函数体,而不仅仅是声明)生成,内联函数除外。内联函数每次实际使用时都会生成代码。模板函数在实例化时生成代码,但多个实例化通常由编译器、链接器或两者优化为单个实例。
静态数据是通过定义全局变量、静态成员变量(同样是实际定义而不仅仅是类内的声明)和静态局部变量来生成的。不得使用
const
修饰符声明变量以进入静态数据段。常量数据是由与静态数据相同类型的变量声明生成的,但使用 const 修饰符,加上浮点文字加上字符串文字,再加上可能更多文字,具体取决于硬件平台。操作系统实际上可能不允许在硬件级别上对常量数据进行写访问,因此如果您尝试在那里写入某些内容,您的程序可能会因访问冲突或分段错误而崩溃。
我并不是真正的低级事物专家,所以我可能错过了一些东西,但我认为我很好地描述了整体情况。
除非我真的错过了某些东西,否则程序中的任何其他内容都不会在目标文件中生成任何数据,尤其是声明和类型定义。这些由编译器内部使用。因此,当编译器看到结构定义时,它会记住它由两个 32 位整数组成。当它发现一些使用该结构的实际代码时,它知道它必须生成使用两个 32 位整数的代码,必须分配至少 8 个字节来存储它等等。但所有这些信息都是在编译时内部使用的,并没有真正进入目标文件。如果 C++ 有像反射这样的东西,那将是另一回事了。
请注意,虽然定义大量结构不会向目标文件添加任何内容,但它可能会增加编译器本身的内存使用量。所以你可能会说,定义相同的东西会导致编译时数据重复,但运行时不会。
No single line from your examples will generate any code in the object file. Or, more precisely, it won't generate any data at all. I think "code" means just processor instructions.
The data in an object file is divided into three segments: code, static data and constant data.
The code is generated by actual function definitions (with function body, not just declarations) except for inline functions. Inline functions generate code each time they are actually used. Template functions generate code when they are instantiated, but multiple instantiations are usually optimized into single instances by either compiler, linker or both.
The static data is generated by defining global variables, static member variables (again, actual definitions and not just declarations inside a class) and static local variables. A variable must not be declared with
const
modifier to go to the static data segment.The constant data is generated by the same kinds of variable declarations as the static data, but with
const
modifiers, plus floating-point literals plus string literals plus maybe more literals depending on the hardware platform. An OS may actually disallow write access to constant data on hardware level so your program may crash with access violation or segmentation fault if you try to write something there.I'm not really an expert on such low-level things so I might have missed something, but I think I described the overall picture pretty well.
Unless I have really missed something, nothing else in a program generates any data in the object file, especially declarations and type definitions. These are used internally by the compiler. So when the compiler sees a struct definition, it remembers that it consists of two 32-bit integers. When it finds some real code that uses that struct, it knows that it must generate code that works with two 32-bit integers, must allocate at least 8 bytes to store it and so on. But all this information is used internally at the compile time and doesn't really go into the object file. If C++ had something like reflection, it would be another story.
Note that while defining a lot of structs add nothing to your object file, it may increase memory usage by the compiler itself. So you may say that defining identical things leads to data duplication at compile time but not at run time.
首先,不会为您包含的第一个结构定义生成任何代码,因此比较这两种类型毫无意义。但在 C++ 中,类型名称很重要,因此 struct A 绝对与 struct B 区别对待。
typedef
创建类型别名,因此经过 typedef 处理的类型确实是原始类型(它不会创建不同的类型)。ConcreteError<0>
与ConcreteError<1>
是不同的类型。当参数在数据布局方面相同并且函数不需要在不同实际数据上调用其他子函数时,我认为没有什么可以阻止编译器变得时髦并将损坏的函数名称别名为相同的代码。类型和函数与两种类型
执行相同的操作,但我认为这在实践中并未真正完成。实际上有一些编译器可以做这些事情(请参阅下面 Ben 的评论)。对于最后一个 typedef(全部都是
ConcreteError<0>
的别名),仅创建一个ConcreteError
的“版本”(因为只有那个版本被实例化)。Firstly, no code is generated for the first structure definitions you included, so it's abit pointless to compare the two types. But in C++, type names are important, so a
struct A
is definitely treated distinctly from astruct B
.typedef
creates type aliases, so the typedef-ed type is indeed the original type (it doesn't create a different type).ConcreteError<0>
is a different type thanConcreteError<1>
.I don't think anything prevents the compiler from being funky and aliasing mangled function names to the same code when the parameters are identical in terms of data layout and the functions don't need to call other subfunctions on the data that are of different actual types and the functions do equivalent things to both types
, but I don't think this is really done in practice. There are actually compilers which do thing (see Ben's comment below).For the last typedef (all are aliases to
ConcreteError<0>
) only one "version" ofConcreteError
is created (because only that one is instantiated).不,未使用的 PODS 不会出现重复代码。如果您使用它们,则会在内存中分配两个 int 和可能的一些填充。当然,它们看起来都是一样的,所以你想要调用什么是有争议的,但它并不比在两个地方使用相同的类型更多的“重复”。
不,没有带有别名的重复代码。实际上根本没有代码。
也许,取决于编译器是否使用某些优化。
或许。取决于您的 typedef 是否在不同的翻译单元中使用以及您的编译器在删除重复实例化方面的表现如何。
不,它是 int[100] 的别名。
在很大程度上,“此构造产生多少机器代码”的问题完全取决于实现。
No, no duplicate code with unused PODS. If you use them there will be two ints and possibly some padding allocated to them in memory. They will of course all look the same so what you want to call that is debatable but it's no more "duplication" than using the same type in two places.
No, no duplicate code with aliases. No code at all actually.
Maybe, depends on if the compiler uses certain optimizations.
Maybe. Depends on if your typedefs are being used in different translation units and how good your compiler is at removing duplicate instantiations.
No, it's an alias for int[100].
To a great degree the question of, "How much machine code results from this construct," is entirely dependent upon the implementation.