为什么 GCC pad 与 NOP 一起起作用？

发布于 2024-12-11 21:28:41 字数 983 浏览 6 评论 0原文

我使用 C 语言已经有一段时间了，最近才开始接触 ASM。当我编译程序时：

int main(void)
  {
  int a = 0;
  a += 1;
  return 0;
  }

objdump 反汇编有代码，但在 ret 之后 nops：

...
08048394 <main>:
 8048394:       55                      push   %ebp
 8048395:       89 e5                   mov    %esp,%ebp
 8048397:       83 ec 10                sub    $0x10,%esp
 804839a:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
 80483a1:       83 45 fc 01             addl   $0x1,-0x4(%ebp)
 80483a5:       b8 00 00 00 00          mov    $0x0,%eax
 80483aa:       c9                      leave  
 80483ab:       c3                      ret    
 80483ac:       90                      nop
 80483ad:       90                      nop
 80483ae:       90                      nop
 80483af:       90                      nop
...

据我所知，nops 不执行任何操作，因为在 ret 之后甚至不会执行。

我的问题是：为什么要麻烦呢？ ELF(linux-x86) 不能使用任何大小的 .text 部分（+main）吗？

我很感激任何帮助，只是想学习。

原文

I've been working with C for a short while and very recently started to get into ASM. When I compile a program:

int main(void)
  {
  int a = 0;
  a += 1;
  return 0;
  }

The objdump disassembly has the code, but nops after the ret:

...
08048394 <main>:
 8048394:       55                      push   %ebp
 8048395:       89 e5                   mov    %esp,%ebp
 8048397:       83 ec 10                sub    $0x10,%esp
 804839a:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
 80483a1:       83 45 fc 01             addl   $0x1,-0x4(%ebp)
 80483a5:       b8 00 00 00 00          mov    $0x0,%eax
 80483aa:       c9                      leave  
 80483ab:       c3                      ret    
 80483ac:       90                      nop
 80483ad:       90                      nop
 80483ae:       90                      nop
 80483af:       90                      nop
...

From what I learned nops do nothing, and since after ret wouldn't even be executed.

My question is: why bother? Couldn't ELF(linux-x86) work with a .text section(+main) of any size?

I'd appreciate any help, just trying to learn.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怀中猫帐中妖 2024-12-18 21:28:41

首先，gcc并不总是这样做。填充由 -falign-functions，由-O2和-O3自动开启：

-falign-functions
-falign-functions=n
将函数的开头与下一个大于 n 的二次方对齐，最多跳过 n 字节。例如，
-falign-functions=32 将函数与下一个 32 字节边界对齐，但 -falign-functions=24 仅与下一个 32 字节边界对齐
如果这可以通过跳过 23 个字节或更少来完成。
-fno-align-functions 和 -falign-functions=1 等效，意味着函数不会对齐。
某些汇编器仅在 n 为 2 的幂时支持此标志；在
在这种情况下，它会被四舍五入。
如果 n 未指定或为零，则使用与机器相关的默认值。
在 -O2、-O3 级别启用。

这样做可能有多种原因，但 x86 上的主要原因可能是：

大多数处理器在对齐的 16 字节或 32 字节块中获取指令。它可以是
将关键循环条目和子例程条目对齐 16 是有利的，以便最大限度地减少
代码中 16 字节边界的数量。或者，确保关键循环条目或子例程条目之后的前几条指令中没有 16 字节边界。

（引自《优化汇编中的子程序
语言”由 Agner Fog 编写。）

编辑：这是一个演示填充的示例：

// align.c
int f(void) { return 0; }
int g(void) { return 0; }

当使用默认设置的 gcc 4.4.5 进行编译时，我得到：

align.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <f>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 00 00 00 00          mov    $0x0,%eax
   9:   c9                      leaveq 
   a:   c3                      retq   

000000000000000b <g>:
   b:   55                      push   %rbp
   c:   48 89 e5                mov    %rsp,%rbp
   f:   b8 00 00 00 00          mov    $0x0,%eax
  14:   c9                      leaveq 
  15:   c3                      retq

指定 -falign-functions 给出：

align.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <f>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 00 00 00 00          mov    $0x0,%eax
   9:   c9                      leaveq 
   a:   c3                      retq   
   b:   eb 03                   jmp    10 <g>
   d:   90                      nop
   e:   90                      nop
   f:   90                      nop

0000000000000010 <g>:
  10:   55                      push   %rbp
  11:   48 89 e5                mov    %rsp,%rbp
  14:   b8 00 00 00 00          mov    $0x0,%eax
  19:   c9                      leaveq 
  1a:   c3                      retq

First of all, gcc doesn't always do this. The padding is controlled by -falign-functions, which is automatically turned on by -O2 and -O3:

-falign-functions
-falign-functions=n
Align the start of functions to the next power-of-two greater than n, skipping up to n bytes. For instance,
-falign-functions=32 aligns functions to the next 32-byte boundary, but -falign-functions=24 would align to the next 32-byte boundary only
if this can be done by skipping 23 bytes or less.
-fno-align-functions and -falign-functions=1 are equivalent and mean that functions will not be aligned.
Some assemblers only support this flag when n is a power of two; in
that case, it is rounded up.
If n is not specified or is zero, use a machine-dependent default.
Enabled at levels -O2, -O3.

There could be multiple reasons for doing this, but the main one on x86 is probably this:

Most processors fetch instructions in aligned 16-byte or 32-byte blocks. It can be
advantageous to align critical loop entries and subroutine entries by 16 in order to minimize
the number of 16-byte boundaries in the code. Alternatively, make sure that there is no 16-byte boundary in the first few instructions after a critical loop entry or subroutine entry.

(Quoted from "Optimizing subroutines in assembly
language" by Agner Fog.)

edit: Here is an example that demonstrates the padding:

// align.c
int f(void) { return 0; }
int g(void) { return 0; }

When compiled using gcc 4.4.5 with default settings, I get:

align.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <f>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 00 00 00 00          mov    $0x0,%eax
   9:   c9                      leaveq 
   a:   c3                      retq   

000000000000000b <g>:
   b:   55                      push   %rbp
   c:   48 89 e5                mov    %rsp,%rbp
   f:   b8 00 00 00 00          mov    $0x0,%eax
  14:   c9                      leaveq 
  15:   c3                      retq

Specifying -falign-functions gives:

align.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <f>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 00 00 00 00          mov    $0x0,%eax
   9:   c9                      leaveq 
   a:   c3                      retq   
   b:   eb 03                   jmp    10 <g>
   d:   90                      nop
   e:   90                      nop
   f:   90                      nop

0000000000000010 <g>:
  10:   55                      push   %rbp
  11:   48 89 e5                mov    %rsp,%rbp
  14:   b8 00 00 00 00          mov    $0x0,%eax
  19:   c9                      leaveq 
  1a:   c3                      retq

回复收藏 0 原文

_畞蕅 2024-12-18 21:28:41

这样做是为了按 8、16 或 32 字节边界对齐下一个函数。

摘自 A.Fog 的“用汇编语言优化子例程”：

11.5 代码对齐
大多数微处理器以对齐的 16 字节或 32 字节块获取代码。如果重要的子例程入口或跳转标签恰好位于 16 字节块的末尾附近，则微处理器在获取该代码块时将仅获得几个有用的代码字节。在解码标签后的第一条指令之前，它可能还必须获取接下来的 16 个字节。通过将重要的子例程条目和循环条目对齐 16 可以避免这种情况。
[...]
对齐子例程条目就像放置尽可能多的子程序条目一样简单
诺普
根据需要在子例程入口之前添加，以使地址可根据需要被 8、16、32 或 64 整除。