编译为 ASM 的 C 如何知道外部函数的分支到哪里?

发布于 2025-01-10 14:17:27 字数 1370 浏览 4 评论 0原文

编译为 ARM ASM 的 C 如何知道外部函数的分支位置?

例如,这是一个简单的 C 程序:

#include <stdio.h>

int main() {
   printf("Hello World!");
   return 0;
}

及其相应的 ARM ASM 程序:

    .arch armv6
    .eabi_attribute 28, 1
    .eabi_attribute 20, 1
    .eabi_attribute 21, 1
    .eabi_attribute 23, 3
    .eabi_attribute 24, 1
    .eabi_attribute 25, 1
    .eabi_attribute 26, 2
    .eabi_attribute 30, 6
    .eabi_attribute 34, 1
    .eabi_attribute 18, 4
    .file   "main.c"
    .text
    .section    .rodata
    .align  2
.LC0:
    .ascii  "Hello World!\000"
    .text
    .align  2
    .global main
    .arch armv6
    .syntax unified
    .arm
    .fpu vfp
    .type   main, %function
main:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 1, uses_anonymous_args = 0
    push    {fp, lr}
    add fp, sp, #4
    ldr r0, .L3
    bl  printf
    mov r3, #0
    mov r0, r3
    pop {fp, pc}
.L4:
    .align  2
.L3:
    .word   .LC0
    .size   main, .-main
    .ident  "GCC: (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110"
    .section    .note.GNU-stack,"",%progbits

我在任何地方都没有看到“printf”标记,因此我假设它链接到程序外部。但它怎么知道去哪里搜索呢?它不会到处都是,因为可能有重复的标签,但也有随机放置的库(从计算机角度来看),尽管我也没有看到它定义库位置的任何地方。

那么它链接到哪里,不仅仅是标准 C 库?
我怎样才能编译它而不依赖那些外部依赖项?
或者知道库在哪里,以便我知道可以删除哪些文件?

我目前在树莓派 400 上运行 Linux

How does C compiled to ARM ASM know where to branch to for external functions?

For example, here is a simple C program:

#include <stdio.h>

int main() {
   printf("Hello World!");
   return 0;
}

and its corresponding ARM ASM program:

    .arch armv6
    .eabi_attribute 28, 1
    .eabi_attribute 20, 1
    .eabi_attribute 21, 1
    .eabi_attribute 23, 3
    .eabi_attribute 24, 1
    .eabi_attribute 25, 1
    .eabi_attribute 26, 2
    .eabi_attribute 30, 6
    .eabi_attribute 34, 1
    .eabi_attribute 18, 4
    .file   "main.c"
    .text
    .section    .rodata
    .align  2
.LC0:
    .ascii  "Hello World!\000"
    .text
    .align  2
    .global main
    .arch armv6
    .syntax unified
    .arm
    .fpu vfp
    .type   main, %function
main:
    @ args = 0, pretend = 0, frame = 0
    @ frame_needed = 1, uses_anonymous_args = 0
    push    {fp, lr}
    add fp, sp, #4
    ldr r0, .L3
    bl  printf
    mov r3, #0
    mov r0, r3
    pop {fp, pc}
.L4:
    .align  2
.L3:
    .word   .LC0
    .size   main, .-main
    .ident  "GCC: (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110"
    .section    .note.GNU-stack,"",%progbits

I dont see a "printf" tag anywhere so i am assuming that it links outside of the program. but how does it know where to search? it wouldnt look everywhere, because there might be duplicate tags, but there are also libraries that are placed (in the computers perspective) at random, though i also dont see anywhere where it defines a library location.

so where does it link, for more than just the standard C library?
and how can i compile it to not rely on those external dependencies?
or know where the libraries are so i know which files i can delete?

i am currently operating linux on a raspberry pi 400

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

还如梦归 2025-01-17 14:17:27

我的电脑有x86_64处理器,但原理是一样的。我正在使用 gcc 9.3.0。

我将您的代码复制到名为 main.c 的文件中,并使用 gcc -S main.c 将其编译为程序集。它生成了包含以下内容的文件 main.s

        .file   "main.c"
        .text
        .section        .rodata
.LC0:
        .string "Hello World!"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        endbr64
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16 
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        leaq    .LC0(%rip), %rdi
        movl    $0, %eax
        call    printf@PLT
        movl    $0, %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
        .section        .note.GNU-stack,"",@progbits
        .section        .note.gnu.property,"a"
        .align 8
        .long    1f - 0f
        .long    4f - 1f
        .long    5   
0:
        .string  "GNU"
1:
        .align 8
        .long    0xc0000002
        .long    3f - 2f
2:
        .long    0x3 
3:
        .align 8
4:

这里有很多汇编指令,可能会导致阅读混乱,所以我将其汇编成一个目标文件(gcc -c main.s),然后运行 ​​objdump -d main.o 对其进行反汇编。这是反汇编的输出:

main.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
   0:   f3 0f 1e fa             endbr64 
   4:   55                      push   %rbp
   5:   48 89 e5                mov    %rsp,%rbp
   8:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # f <main+0xf>
   f:   b8 00 00 00 00          mov    $0x0,%eax
  14:   e8 00 00 00 00          callq  19 <main+0x19>
  19:   b8 00 00 00 00          mov    $0x0,%eax
  1e:   5d                      pop    %rbp
  1f:   c3                      retq   

这里的前三个指令是样板文件,因此我们将忽略它们。第一个有趣的指令是

lea    0x0(%rip),%rdi

This 旨在将 "Hello World!" 字符串的地址加载到寄存器 %rdi 中。令人困惑的是,它似乎只是将 %rip 复制到 %rdi 中。

下一条指令将 0 放入寄存器 %eax 中。我其实不知道这是为什么,但这与本次讨论无关。

然后是对 printf 的实际调用:

callq  19 <main+0x19>

这再次使用了一个看起来不正确的地址。您可能会注意到地址0x19实际上指向下一条指令。

接下来的 3 条指令基本上执行最后的返回 0

要真正回答您的问题,我们需要查看的不仅仅是汇编代码。此时我建议花一些时间研究 ELF 文件的格式。我认为该主题超出了本答案的范围,但它将帮助您理解我将要向您展示的内容。

我首先想指出,在您和我的程序集中,"Hello World!" 字符串前面是该指令:

.section        .rodata

main 函数前面

.text

是 这些指令指示

.section        .text

汇编器如何排列目标文件中的代码和数据。您可以通过打印目标文件的节标题来看到这一点:

$ readelf -S main.o
There are 14 section headers, starting at offset 0x318:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       0000000000000020  0000000000000000  AX       0     0     1
  [ 2] .rela.text        RELA             0000000000000000  00000258
       0000000000000030  0000000000000018   I      11     1     8
  [ 3] .data             PROGBITS         0000000000000000  00000060
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .bss              NOBITS           0000000000000000  00000060
       0000000000000000  0000000000000000  WA       0     0     1
  [ 5] .rodata           PROGBITS         0000000000000000  00000060
       000000000000000d  0000000000000000   A       0     0     1
  [ 6] .comment          PROGBITS         0000000000000000  0000006d
       000000000000002b  0000000000000001  MS       0     0     1
  [ 7] .note.GNU-stack   PROGBITS         0000000000000000  00000098
       0000000000000000  0000000000000000           0     0     1
  [ 8] .note.gnu.propert NOTE             0000000000000000  00000098
       0000000000000020  0000000000000000   A       0     0     8
  [ 9] .eh_frame         PROGBITS         0000000000000000  000000b8
       0000000000000038  0000000000000000   A       0     0     8
  [10] .rela.eh_frame    RELA             0000000000000000  00000288
       0000000000000018  0000000000000018   I      11     9     8
  [11] .symtab           SYMTAB           0000000000000000  000000f0
       0000000000000138  0000000000000018          12    10     8
  [12] .strtab           STRTAB           0000000000000000  00000228
       000000000000002a  0000000000000000           0     0     1
  [13] .shstrtab         STRTAB           0000000000000000  000002a0
       0000000000000074  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

如果您能弄清楚如何读取此输出,您将看到 .text 节是 0x20 字节大小(与上面的反汇编输出匹配),并且 .rodata 部分的大小为 0xd (13) 字节(即 strlen("Hello World!" ) 加一个空字节)。然而,你的问题的答案就在重定位数据中:

$ readelf -r main.o

Relocation section '.rela.text' at offset 0x258 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000b  000500000002 R_X86_64_PC32     0000000000000000 .rodata - 4
000000000015  000c00000004 R_X86_64_PLT32    0000000000000000 printf - 4

Relocation section '.rela.eh_frame' at offset 0x288 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000020  000200000002 R_X86_64_PC32     0000000000000000 .text + 0

如果你不知道它的含义,那么这个输出读起来也很混乱。首先要理解的是,重定位节告诉链接器代码中依赖于符号的位置,或者在同一文件的其他节中,或者更常见的是在其他文件中定义的符号。例如,.rela.text 部分包含有关 .text 部分的重定位信息。当此目标文件链接到最终的可执行文件时,链接器将使用丢失的地址覆盖 .text 部分的部分内容。

因此,查看 .rela.text 下的第一个条目,我们看到偏移量为 0xb。查看反汇编代码,我们可以看到偏移量 0xb 引用了 lea 指令的 7 字节编码的第四个字节。类型 R_X86_64_PC32 告诉我们该指令需要 32 位 PC 相对地址,因此我们可以预期链接器会覆盖接下来的 4 个字节(当前均为 0)。最右边的一列以人类可读的格式告诉我们,该地址需要用 .rodata 部分的地址减去 4 填充(使用 PC 相对寻址,您必须记住 PC 将是指向下一条指令)。它忽略了重定位类型 R_X86_64_PC32 隐含的事实,即它将从中减去 .text 中字节 0xb 的最终地址。部分,这将使该指针成为指向“Hello World!”字符串数据的有效的 PC 相关指针。

同样,第二个条目告诉链接器将 printf 的地址(负 4)复制到 .text 部分中的偏移量 0x15,这将是callq指令编码的最后4个字节。在本例中,类型为 R_X86_64_PLT32,它告诉我们它指向过程链接表 (PLT) 中的条目。 PLT 用于动态链接,以便共享对象库(在本例中为 libc.so)可以一次加载到物理内存中并由许多正在运行的可执行文件共享。

请注意,这可能会回答您的一些具体问题,您的编译器会自动链接执行程序所需的所有运行时库。这包括任何标准库函数,它们是 libc.so 的一部分。没有“外部依赖性”运行的唯一方法是在裸机系统(即没有操作系统的系统)上运行。您使用的任何操作系统都必须做一些工作才能让您的程序开始 main()

My computer has an x86_64 processor, but the principle is the same. I'm using gcc 9.3.0.

I copied your code into a file called main.c and compiled it to assembly with gcc -S main.c. It produced the file main.s with the following contents:

        .file   "main.c"
        .text
        .section        .rodata
.LC0:
        .string "Hello World!"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        endbr64
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16 
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        leaq    .LC0(%rip), %rdi
        movl    $0, %eax
        call    printf@PLT
        movl    $0, %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0"
        .section        .note.GNU-stack,"",@progbits
        .section        .note.gnu.property,"a"
        .align 8
        .long    1f - 0f
        .long    4f - 1f
        .long    5   
0:
        .string  "GNU"
1:
        .align 8
        .long    0xc0000002
        .long    3f - 2f
2:
        .long    0x3 
3:
        .align 8
4:

There are a lot of assembler directives here that can make it confusing to read, so I assembled it into an object file (gcc -c main.s) and then ran objdump -d main.o to disassemble it. Here is the output of the disassembly:

main.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
   0:   f3 0f 1e fa             endbr64 
   4:   55                      push   %rbp
   5:   48 89 e5                mov    %rsp,%rbp
   8:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # f <main+0xf>
   f:   b8 00 00 00 00          mov    $0x0,%eax
  14:   e8 00 00 00 00          callq  19 <main+0x19>
  19:   b8 00 00 00 00          mov    $0x0,%eax
  1e:   5d                      pop    %rbp
  1f:   c3                      retq   

The first three instructions here are boilerplate, so we'll ignore them. The first interesting instruction is

lea    0x0(%rip),%rdi

This is meant to load the address of the "Hello World!" string into register %rdi. Confusingly, it appears to simply be copying %rip into %rdi.

The next instruction puts a 0 into register %eax. I actually don't know why this is, but it's not really relevant to this discussion.

Then comes the actual call to printf:

callq  19 <main+0x19>

Once again, this uses an address that doesn't seem correct. You may notice that address 0x19 actually points to the next instruction.

The next 3 instructions basically perform the final return 0.

To really answer your question we need to look at more than just assembly code. At this point I would recommend taking some time to research the format of ELF files. I would consider that topic to be beyond the scope of this answer, but it will help you understand what I'm about to show you.

I first want to point out that in both your assembly and mine, the "Hello World!" string is preceded by this directive:

.section        .rodata

whereas the main function is preceded by

.text

which is shorthand for

.section        .text

These directives instruct the assembler on how to arrange the code and data in the object file. You can see this by printing the section headers of the object file:

$ readelf -S main.o
There are 14 section headers, starting at offset 0x318:

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       0000000000000020  0000000000000000  AX       0     0     1
  [ 2] .rela.text        RELA             0000000000000000  00000258
       0000000000000030  0000000000000018   I      11     1     8
  [ 3] .data             PROGBITS         0000000000000000  00000060
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .bss              NOBITS           0000000000000000  00000060
       0000000000000000  0000000000000000  WA       0     0     1
  [ 5] .rodata           PROGBITS         0000000000000000  00000060
       000000000000000d  0000000000000000   A       0     0     1
  [ 6] .comment          PROGBITS         0000000000000000  0000006d
       000000000000002b  0000000000000001  MS       0     0     1
  [ 7] .note.GNU-stack   PROGBITS         0000000000000000  00000098
       0000000000000000  0000000000000000           0     0     1
  [ 8] .note.gnu.propert NOTE             0000000000000000  00000098
       0000000000000020  0000000000000000   A       0     0     8
  [ 9] .eh_frame         PROGBITS         0000000000000000  000000b8
       0000000000000038  0000000000000000   A       0     0     8
  [10] .rela.eh_frame    RELA             0000000000000000  00000288
       0000000000000018  0000000000000018   I      11     9     8
  [11] .symtab           SYMTAB           0000000000000000  000000f0
       0000000000000138  0000000000000018          12    10     8
  [12] .strtab           STRTAB           0000000000000000  00000228
       000000000000002a  0000000000000000           0     0     1
  [13] .shstrtab         STRTAB           0000000000000000  000002a0
       0000000000000074  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

If you can figure out how to read this output, you will see that the .text section is 0x20 bytes in size (which matches the above disassembly output), and the .rodata section is 0xd (13) bytes in size (i.e. strlen("Hello World!") plus a null byte). The answer your question, however, is in the relocation data:

$ readelf -r main.o

Relocation section '.rela.text' at offset 0x258 contains 2 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000000b  000500000002 R_X86_64_PC32     0000000000000000 .rodata - 4
000000000015  000c00000004 R_X86_64_PLT32    0000000000000000 printf - 4

Relocation section '.rela.eh_frame' at offset 0x288 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000020  000200000002 R_X86_64_PC32     0000000000000000 .text + 0

This output is also very confusing to read if you don't know what it means. The first thing to understand is that the relocation sections tell the linker about places in the code that depend on symbols, either in other sections of the same file, or, more frequently, symbols that are defined in other files. The .rela.text section, for example, contains relocation information about the .text section. When this object file is linked into the final executable, the linker will overwrite part of the .text section with the missing addresses.

So, looking at the first entry under .rela.text, we see an offset of 0xb. Looking at the disassembly, we can see that offset 0xb references the fourth byte of the lea instruction's 7-byte encoding. The type, R_X86_64_PC32, tells us that that instruction is expecting a 32-bit PC-relative address, so we can expect the linker to overwrite the next 4 bytes (currently all 0). The rightmost column tells us, in human readable format, that this address needs to be populated with the address of the .rodata section minus 4 (with PC-relative addressing you have to remember that the PC will be pointing at the next instruction). It leaves out the fact, implicit for relocation type R_X86_64_PC32, that it will then subtract from that the final address of byte 0xb in the .text section, which will make that a valid PC-relative pointer to the "Hello World!" string data.

Similarly, the second entry tells the linker to copy the address of printf (minus 4) to offset 0x15 in the .text section, which would be the last 4 bytes of the callq instruction encoding. In this case, the type is R_X86_64_PLT32, which tells us that it's pointing to an entry in the procedure linkage table (PLT). A PLT is used for dynamic linking so that shared object libraries (in this case libc.so) can be loaded into physical memory once and shared by many running executables.

As a note, that might answer some of your specific questions, your compiler automatically links all the runtime libraries needed to execute a program. This includes any standard library functions, which would be part of libc.so. The only way to run without "external dependencies" would be to run on a bare-metal system (i.e. one without an operating system). Any operating system you use will have to do some amount of work to get your program to the start of main().

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文