GCC函数填充值
每当我编译启用优化的 C 或 C++ 代码时,GCC 都会将函数对齐到 16 字节边界(在 IA-32 上)。如果函数短于 16 个字节,GCC 会用一些字节填充它,这些字节似乎根本不是随机的:
19: c3 ret
1a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
它似乎总是 8d b6 00 00 00 00 ...
或8d 74 26 00
。
函数填充字节有什么意义吗?
Whenever I compile C or C++ code with optimizations enable,d GCC aligns functions to a 16-byte boundary (on IA-32). If the function is shorter than 16 bytes, GCC pads it with some bytes, which don't seem to be random at all:
19: c3 ret
1a: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
It always seems to be either 8d b6 00 00 00 00 ...
or 8d 74 26 00
.
Do function padding bytes have any significance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
填充是由汇编器创建的,而不是由 gcc 创建的。它只看到一个
.align
指令(或等效指令),并且不知道要填充的空间是在函数内部(例如循环对齐)还是在函数之间,因此它必须插入NOP某种类型的。现代 x86 汇编器使用尽可能多的 NOP 操作码,目的是在填充用于循环对齐时花费尽可能少的周期。
就我个人而言,我对对齐作为一种优化技术非常怀疑。我从未见过它有多大帮助,而且它肯定会因极大地增加总代码大小(和缓存利用率)而受到损害。如果您使用
-Os
优化级别,则默认情况下它处于关闭状态,因此无需担心。否则,您可以使用正确的-f
选项禁用所有对齐。The padding is created by the assembler, not by gcc. It merely sees a
.align
directive (or equivalent) and doesn't know whether the space to be padded is inside a function (e.g. loop alignment) or between functions, so it must insertNOP
s of some sort. Modern x86 assemblers use the largest possibleNOP
opcodes with the intention of spending as few cycles as possible if the padding is for loop alignment.Personally, I'm extremely skeptical of alignment as an optimization technique. I've never seen it help much, and it can definitely hurt by increasing the total code size (and cache utilization) tremendously. If you use the
-Os
optimization level, it's off by default, so there's nothing to worry about. Otherwise you can disable all the alignments with the proper-f
options.汇编器首先看到
.align
指令。由于它不知道该地址是否在函数体内,因此无法输出 NULL0x00
字节,并且必须生成NOP
(0x90
代码>)。但是:
执行的时钟周期少于
如果此代码碰巧落入函数体内(例如,循环对齐),则 lea 版本会快得多,同时仍然“不执行任何操作”。
The assembler first sees an
.align
directive. Since it doesn't know if this address is within a function body or not, it cannot output NULL0x00
bytes, and must generateNOP
s (0x90
).However:
executes in fewer clock cycles than
If this code happened to fall within a function body (for instance, loop alignment), the
lea
version would be much faster, while still "doing nothing."指令
lea 0x0(%esi),%esi
只是将%esi
中的值加载到%esi
中 - 它是无操作的(或 < code>NOP),这意味着如果执行它不会有任何效果。这恰好是一条指令,6 字节 NOP。
8d 74 26 00
只是同一指令的 4 字节编码。The instruction
lea 0x0(%esi),%esi
just loads the value in%esi
into%esi
- it's no-operation (orNOP
), which means that if it's executed it will have no effect.This just happens to be a single instruction, 6-byte NOP.
8d 74 26 00
is just a 4-byte encoding of the same instruction.