为什么 GCC pad 与 NOP 一起起作用?

发布于 2024-12-11 21:28:41 字数 983 浏览 0 评论 0原文

我使用 C 语言已经有一段时间了,最​​近才开始接触 ASM。当我编译程序时:

int main(void)
  {
  int a = 0;
  a += 1;
  return 0;
  }

objdump 反汇编有代码,但在 ret 之后 nops:

...
08048394 <main>:
 8048394:       55                      push   %ebp
 8048395:       89 e5                   mov    %esp,%ebp
 8048397:       83 ec 10                sub    $0x10,%esp
 804839a:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
 80483a1:       83 45 fc 01             addl   $0x1,-0x4(%ebp)
 80483a5:       b8 00 00 00 00          mov    $0x0,%eax
 80483aa:       c9                      leave  
 80483ab:       c3                      ret    
 80483ac:       90                      nop
 80483ad:       90                      nop
 80483ae:       90                      nop
 80483af:       90                      nop
...

据我所知,nops 不执行任何操作,因为在 ret 之后甚至不会执行。

我的问题是:为什么要麻烦呢? ELF(linux-x86) 不能使用任何大小的 .text 部分(+main)吗?

我很感激任何帮助,只是想学习。

I've been working with C for a short while and very recently started to get into ASM. When I compile a program:

int main(void)
  {
  int a = 0;
  a += 1;
  return 0;
  }

The objdump disassembly has the code, but nops after the ret:

...
08048394 <main>:
 8048394:       55                      push   %ebp
 8048395:       89 e5                   mov    %esp,%ebp
 8048397:       83 ec 10                sub    $0x10,%esp
 804839a:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%ebp)
 80483a1:       83 45 fc 01             addl   $0x1,-0x4(%ebp)
 80483a5:       b8 00 00 00 00          mov    $0x0,%eax
 80483aa:       c9                      leave  
 80483ab:       c3                      ret    
 80483ac:       90                      nop
 80483ad:       90                      nop
 80483ae:       90                      nop
 80483af:       90                      nop
...

From what I learned nops do nothing, and since after ret wouldn't even be executed.

My question is: why bother? Couldn't ELF(linux-x86) work with a .text section(+main) of any size?

I'd appreciate any help, just trying to learn.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

怀中猫帐中妖 2024-12-18 21:28:41

首先,gcc并不总是这样做。填充由 -falign-functions,由-O2-O3自动开启:

-falign-functions
-falign-functions=n

将函数的开头与下一个大于 n 的二次方对齐,最多跳过 n 字节。例如,
-falign-functions=32 将函数与下一个 32 字节边界对齐,但 -falign-functions=24 仅与下一个 32 字节边界对齐
如果这可以通过跳过 23 个字节或更少来完成。

-fno-align-functions-falign-functions=1 等效,意味着函数不会对齐。

某些汇编器仅在 n 为 2 的幂时支持此标志;在
在这种情况下,它会被四舍五入。

如果 n 未指定或为零,则使用与机器相关的默认值。

在 -O2、-O3 级别启用。

这样做可能有多种原因,但 x86 上的主要原因可能是:

大多数处理器在对齐的 16 字节或 32 字节块中获取指令。它可以是
将关键循环条目和子例程条目对齐 16 是有利的,以便最大限度地减少
代码中 16 字节边界的数量。或者,确保关键循环条目或子例程条目之后的前几条指令中没有 16 字节边界。

(引自《优化汇编中的子程序
语言”由 Agner Fog 编写。)

编辑:这是一个演示填充的示例:

// align.c
int f(void) { return 0; }
int g(void) { return 0; }

当使用默认设置的 gcc 4.4.5 进行编译时,我得到:

align.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <f>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 00 00 00 00          mov    $0x0,%eax
   9:   c9                      leaveq 
   a:   c3                      retq   

000000000000000b <g>:
   b:   55                      push   %rbp
   c:   48 89 e5                mov    %rsp,%rbp
   f:   b8 00 00 00 00          mov    $0x0,%eax
  14:   c9                      leaveq 
  15:   c3                      retq   

指定 -falign-functions 给出:

align.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <f>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 00 00 00 00          mov    $0x0,%eax
   9:   c9                      leaveq 
   a:   c3                      retq   
   b:   eb 03                   jmp    10 <g>
   d:   90                      nop
   e:   90                      nop
   f:   90                      nop

0000000000000010 <g>:
  10:   55                      push   %rbp
  11:   48 89 e5                mov    %rsp,%rbp
  14:   b8 00 00 00 00          mov    $0x0,%eax
  19:   c9                      leaveq 
  1a:   c3                      retq   

First of all, gcc doesn't always do this. The padding is controlled by -falign-functions, which is automatically turned on by -O2 and -O3:

-falign-functions
-falign-functions=n

Align the start of functions to the next power-of-two greater than n, skipping up to n bytes. For instance,
-falign-functions=32 aligns functions to the next 32-byte boundary, but -falign-functions=24 would align to the next 32-byte boundary only
if this can be done by skipping 23 bytes or less.

-fno-align-functions and -falign-functions=1 are equivalent and mean that functions will not be aligned.

Some assemblers only support this flag when n is a power of two; in
that case, it is rounded up.

If n is not specified or is zero, use a machine-dependent default.

Enabled at levels -O2, -O3.

There could be multiple reasons for doing this, but the main one on x86 is probably this:

Most processors fetch instructions in aligned 16-byte or 32-byte blocks. It can be
advantageous to align critical loop entries and subroutine entries by 16 in order to minimize
the number of 16-byte boundaries in the code. Alternatively, make sure that there is no 16-byte boundary in the first few instructions after a critical loop entry or subroutine entry.

(Quoted from "Optimizing subroutines in assembly
language" by Agner Fog.)

edit: Here is an example that demonstrates the padding:

// align.c
int f(void) { return 0; }
int g(void) { return 0; }

When compiled using gcc 4.4.5 with default settings, I get:

align.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <f>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 00 00 00 00          mov    $0x0,%eax
   9:   c9                      leaveq 
   a:   c3                      retq   

000000000000000b <g>:
   b:   55                      push   %rbp
   c:   48 89 e5                mov    %rsp,%rbp
   f:   b8 00 00 00 00          mov    $0x0,%eax
  14:   c9                      leaveq 
  15:   c3                      retq   

Specifying -falign-functions gives:

align.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <f>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   b8 00 00 00 00          mov    $0x0,%eax
   9:   c9                      leaveq 
   a:   c3                      retq   
   b:   eb 03                   jmp    10 <g>
   d:   90                      nop
   e:   90                      nop
   f:   90                      nop

0000000000000010 <g>:
  10:   55                      push   %rbp
  11:   48 89 e5                mov    %rsp,%rbp
  14:   b8 00 00 00 00          mov    $0x0,%eax
  19:   c9                      leaveq 
  1a:   c3                      retq   
_畞蕅 2024-12-18 21:28:41

这样做是为了按 8、16 或 32 字节边界对齐下一个函数。

摘自 A.Fog 的“用汇编语言优化子例程”:

11.5 代码对齐

大多数微处理器以对齐的 16 字节或 32 字节块获取代码。如果重要的子例程入口或跳转标签恰好位于 16 字节块的末尾附近,则微处理器在获取该代码块时将仅获得几个有用的代码字节。在解码标签后的第一条指令之前,它可能还必须获取接下来的 16 个字节。通过将重要的子例程条目和循环条目对齐 16 可以避免这种情况。

[...]

对齐子例程条目就像放置尽可能多的子程序条目一样简单
诺普
根据需要在子例程入口之前添加 ,以使地址可根据需要被 8、16、32 或 64 整除。

This is done to align the next function by 8, 16 or 32-byte boundary.

From “Optimizing subroutines in assembly language” by A.Fog:

11.5 Alignment of code

Most microprocessors fetch code in aligned 16-byte or 32-byte blocks. If an importantsubroutine entry or jump label happens to be near the end of a 16-byte block then themicroprocessor will only get a few useful bytes of code when fetching that block of code. Itmay have to fetch the next 16 bytes too before it can decode the first instructions after thelabel. This can be avoided by aligning important subroutine entries and loop entries by 16.

[...]

Aligning a subroutine entry is as simple as putting as many
NOP
's as needed before thesubroutine entry to make the address divisible by 8, 16, 32 or 64, as desired.

御弟哥哥 2024-12-18 21:28:41

据我记得,指令在CPU中进行流水线处理,不同的CPU块(加载器、解码器等)处理后续指令。当RET指令被执行时,接下来的指令很少已经加载到CPU管道中。这是一个猜测,但您可以在这里开始挖掘,如果您找到了(也许是安全的 NOP 的具体数量,请分享您的发现。

As far as I remember, instructions are pipelined in cpu and different cpu blocks (loader, decoder and such) process subsequent instructions. When RET instructions is being executed, few next instructions are already loaded into cpu pipeline. It's a guess, but you can start digging here and if you find out (maybe the specific number of NOPs that are safe, share your findings please.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文