编译为 ASM 的 C 如何知道外部函数的分支到哪里?
编译为 ARM ASM 的 C 如何知道外部函数的分支位置?
例如,这是一个简单的 C 程序:
#include <stdio.h>
int main() {
printf("Hello World!");
return 0;
}
及其相应的 ARM ASM 程序:
.arch armv6
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "main.c"
.text
.section .rodata
.align 2
.LC0:
.ascii "Hello World!\000"
.text
.align 2
.global main
.arch armv6
.syntax unified
.arm
.fpu vfp
.type main, %function
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, uses_anonymous_args = 0
push {fp, lr}
add fp, sp, #4
ldr r0, .L3
bl printf
mov r3, #0
mov r0, r3
pop {fp, pc}
.L4:
.align 2
.L3:
.word .LC0
.size main, .-main
.ident "GCC: (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110"
.section .note.GNU-stack,"",%progbits
我在任何地方都没有看到“printf”标记,因此我假设它链接到程序外部。但它怎么知道去哪里搜索呢?它不会到处都是,因为可能有重复的标签,但也有随机放置的库(从计算机角度来看),尽管我也没有看到它定义库位置的任何地方。
那么它链接到哪里,不仅仅是标准 C 库?
我怎样才能编译它而不依赖那些外部依赖项?
或者知道库在哪里,以便我知道可以删除哪些文件?
我目前在树莓派 400 上运行 Linux
How does C compiled to ARM ASM know where to branch to for external functions?
For example, here is a simple C program:
#include <stdio.h>
int main() {
printf("Hello World!");
return 0;
}
and its corresponding ARM ASM program:
.arch armv6
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 6
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "main.c"
.text
.section .rodata
.align 2
.LC0:
.ascii "Hello World!\000"
.text
.align 2
.global main
.arch armv6
.syntax unified
.arm
.fpu vfp
.type main, %function
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 1, uses_anonymous_args = 0
push {fp, lr}
add fp, sp, #4
ldr r0, .L3
bl printf
mov r3, #0
mov r0, r3
pop {fp, pc}
.L4:
.align 2
.L3:
.word .LC0
.size main, .-main
.ident "GCC: (Raspbian 10.2.1-6+rpi1) 10.2.1 20210110"
.section .note.GNU-stack,"",%progbits
I dont see a "printf" tag anywhere so i am assuming that it links outside of the program. but how does it know where to search? it wouldnt look everywhere, because there might be duplicate tags, but there are also libraries that are placed (in the computers perspective) at random, though i also dont see anywhere where it defines a library location.
so where does it link, for more than just the standard C library?
and how can i compile it to not rely on those external dependencies?
or know where the libraries are so i know which files i can delete?
i am currently operating linux on a raspberry pi 400
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我的电脑有x86_64处理器,但原理是一样的。我正在使用 gcc 9.3.0。
我将您的代码复制到名为
main.c
的文件中,并使用gcc -S main.c
将其编译为程序集。它生成了包含以下内容的文件main.s
:这里有很多汇编指令,可能会导致阅读混乱,所以我将其汇编成一个目标文件(
gcc -c main.s
),然后运行 objdump -d main.o 对其进行反汇编。这是反汇编的输出:这里的前三个指令是样板文件,因此我们将忽略它们。第一个有趣的指令是
This 旨在将
"Hello World!"
字符串的地址加载到寄存器%rdi
中。令人困惑的是,它似乎只是将%rip
复制到%rdi
中。下一条指令将
0
放入寄存器%eax
中。我其实不知道这是为什么,但这与本次讨论无关。然后是对
printf
的实际调用:这再次使用了一个看起来不正确的地址。您可能会注意到地址
0x19
实际上指向下一条指令。接下来的 3 条指令基本上执行最后的
返回 0
。要真正回答您的问题,我们需要查看的不仅仅是汇编代码。此时我建议花一些时间研究 ELF 文件的格式。我认为该主题超出了本答案的范围,但它将帮助您理解我将要向您展示的内容。
我首先想指出,在您和我的程序集中,
"Hello World!"
字符串前面是该指令:而
main
函数前面是 这些指令指示
汇编器如何排列目标文件中的代码和数据。您可以通过打印目标文件的节标题来看到这一点:
如果您能弄清楚如何读取此输出,您将看到
.text
节是0x20
字节大小(与上面的反汇编输出匹配),并且.rodata
部分的大小为0xd
(13) 字节(即strlen("Hello World!" )
加一个空字节)。然而,你的问题的答案就在重定位数据中:如果你不知道它的含义,那么这个输出读起来也很混乱。首先要理解的是,重定位节告诉链接器代码中依赖于符号的位置,或者在同一文件的其他节中,或者更常见的是在其他文件中定义的符号。例如,
.rela.text
部分包含有关.text
部分的重定位信息。当此目标文件链接到最终的可执行文件时,链接器将使用丢失的地址覆盖.text
部分的部分内容。因此,查看
.rela.text
下的第一个条目,我们看到偏移量为0xb
。查看反汇编代码,我们可以看到偏移量0xb
引用了lea
指令的 7 字节编码的第四个字节。类型 R_X86_64_PC32 告诉我们该指令需要 32 位 PC 相对地址,因此我们可以预期链接器会覆盖接下来的 4 个字节(当前均为 0)。最右边的一列以人类可读的格式告诉我们,该地址需要用.rodata
部分的地址减去 4 填充(使用 PC 相对寻址,您必须记住 PC 将是指向下一条指令)。它忽略了重定位类型R_X86_64_PC32
隐含的事实,即它将从中减去.text
中字节0xb
的最终地址。部分,这将使该指针成为指向“Hello World!”字符串数据的有效的 PC 相关指针。同样,第二个条目告诉链接器将
printf
的地址(负 4)复制到.text
部分中的偏移量0x15
,这将是callq
指令编码的最后4个字节。在本例中,类型为 R_X86_64_PLT32,它告诉我们它指向过程链接表 (PLT) 中的条目。 PLT 用于动态链接,以便共享对象库(在本例中为 libc.so)可以一次加载到物理内存中并由许多正在运行的可执行文件共享。请注意,这可能会回答您的一些具体问题,您的编译器会自动链接执行程序所需的所有运行时库。这包括任何标准库函数,它们是 libc.so 的一部分。没有“外部依赖性”运行的唯一方法是在裸机系统(即没有操作系统的系统)上运行。您使用的任何操作系统都必须做一些工作才能让您的程序开始
main()
。My computer has an x86_64 processor, but the principle is the same. I'm using gcc 9.3.0.
I copied your code into a file called
main.c
and compiled it to assembly withgcc -S main.c
. It produced the filemain.s
with the following contents:There are a lot of assembler directives here that can make it confusing to read, so I assembled it into an object file (
gcc -c main.s
) and then ranobjdump -d main.o
to disassemble it. Here is the output of the disassembly:The first three instructions here are boilerplate, so we'll ignore them. The first interesting instruction is
This is meant to load the address of the
"Hello World!"
string into register%rdi
. Confusingly, it appears to simply be copying%rip
into%rdi
.The next instruction puts a
0
into register%eax
. I actually don't know why this is, but it's not really relevant to this discussion.Then comes the actual call to
printf
:Once again, this uses an address that doesn't seem correct. You may notice that address
0x19
actually points to the next instruction.The next 3 instructions basically perform the final
return 0
.To really answer your question we need to look at more than just assembly code. At this point I would recommend taking some time to research the format of ELF files. I would consider that topic to be beyond the scope of this answer, but it will help you understand what I'm about to show you.
I first want to point out that in both your assembly and mine, the
"Hello World!"
string is preceded by this directive:whereas the
main
function is preceded bywhich is shorthand for
These directives instruct the assembler on how to arrange the code and data in the object file. You can see this by printing the section headers of the object file:
If you can figure out how to read this output, you will see that the
.text
section is0x20
bytes in size (which matches the above disassembly output), and the.rodata
section is0xd
(13) bytes in size (i.e.strlen("Hello World!")
plus a null byte). The answer your question, however, is in the relocation data:This output is also very confusing to read if you don't know what it means. The first thing to understand is that the relocation sections tell the linker about places in the code that depend on symbols, either in other sections of the same file, or, more frequently, symbols that are defined in other files. The
.rela.text
section, for example, contains relocation information about the.text
section. When this object file is linked into the final executable, the linker will overwrite part of the.text
section with the missing addresses.So, looking at the first entry under
.rela.text
, we see an offset of0xb
. Looking at the disassembly, we can see that offset0xb
references the fourth byte of thelea
instruction's 7-byte encoding. The type,R_X86_64_PC32
, tells us that that instruction is expecting a 32-bit PC-relative address, so we can expect the linker to overwrite the next 4 bytes (currently all 0). The rightmost column tells us, in human readable format, that this address needs to be populated with the address of the.rodata
section minus 4 (with PC-relative addressing you have to remember that the PC will be pointing at the next instruction). It leaves out the fact, implicit for relocation typeR_X86_64_PC32
, that it will then subtract from that the final address of byte0xb
in the.text
section, which will make that a valid PC-relative pointer to the"Hello World!"
string data.Similarly, the second entry tells the linker to copy the address of
printf
(minus 4) to offset0x15
in the.text
section, which would be the last 4 bytes of thecallq
instruction encoding. In this case, the type isR_X86_64_PLT32
, which tells us that it's pointing to an entry in the procedure linkage table (PLT). A PLT is used for dynamic linking so that shared object libraries (in this caselibc.so
) can be loaded into physical memory once and shared by many running executables.As a note, that might answer some of your specific questions, your compiler automatically links all the runtime libraries needed to execute a program. This includes any standard library functions, which would be part of
libc.so
. The only way to run without "external dependencies" would be to run on a bare-metal system (i.e. one without an operating system). Any operating system you use will have to do some amount of work to get your program to the start ofmain()
.