静态链接可执行文件的启动代码会发出如此多的系统调用?

发布于 2024-12-07 17:58:30 字数 1167 浏览 1 评论 0原文

我正在尝试静态编译一个最小程序并检查发出的系统调用:

$ cat hello.c
#include <stdio.h>

int main (void) {
  write(1, "Hello world!", 12);
  return 0;
}

$ gcc hello.c -static

$ objdump -f a.out
a.out:     file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00000000004003c0

$ strace ./a.out
execve("./a.out", ["./a.out"], [/* 39 vars */]) = 0
uname({sys="Linux", node="ubuntu", ...}) = 0
brk(0)                                  = 0xa20000
brk(0xa211a0)                           = 0xa211a0
arch_prctl(ARCH_SET_FS, 0xa20880)       = 0
brk(0xa421a0)                           = 0xa421a0
brk(0xa43000)                           = 0xa43000
write(1, "Hello world!", 12Hello world!)            = 12
exit_group(0)                           = ?

我知道当非静态链接时,ld 会发出启动代码来映射 libc.so和 ld.so 到进程的地址空间,并且 ld.so 将继续加载任何其他共享库。

但在这种情况下,除了 execvewriteexit_group 之外,为什么还发出这么多系统调用呢?

为什么要uname(2)?为什么这么多次调用 brk(2) 来获取和设置程序中断,以及调用 arch_prctl(2) 来设置进程状态,而这看起来像是什么这应该在内核空间的 execve 时间完成吗?

I am experimenting by statically compiling a minimal program and examining the system calls that are issued:

$ cat hello.c
#include <stdio.h>

int main (void) {
  write(1, "Hello world!", 12);
  return 0;
}

$ gcc hello.c -static

$ objdump -f a.out
a.out:     file format elf64-x86-64
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00000000004003c0

$ strace ./a.out
execve("./a.out", ["./a.out"], [/* 39 vars */]) = 0
uname({sys="Linux", node="ubuntu", ...}) = 0
brk(0)                                  = 0xa20000
brk(0xa211a0)                           = 0xa211a0
arch_prctl(ARCH_SET_FS, 0xa20880)       = 0
brk(0xa421a0)                           = 0xa421a0
brk(0xa43000)                           = 0xa43000
write(1, "Hello world!", 12Hello world!)            = 12
exit_group(0)                           = ?

I know that when linked non-statically, ld emits startup code to map libc.so and ld.so into the process's address space, and ld.so would continue loading any other shared libraries.

But in this case, why are so many system calls issued, apart from execve, write and exit_group?

Why the heck uname(2)? Why so many calls to brk(2) to get and set the program break, and a call to arch_prctl(2) to set the process state, when that seems like something that should have been done in kernel-space, at execve time?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不寐倦长更 2024-12-14 17:58:30

需要uname来检查内核版本是否太旧。

设置线程本地存储需要两个brk。还需要另外两个来设置动态加载器路径(可执行文件仍然可能调用 dlopen,即使它是静态链接的)。我不知道为什么这些是成对出现的。

在系统上,不会调用 arch_prctl,而是调用 set_thread_area。这会为当前线程设置 TLS。

这些事情可能可以延迟完成(即在第一次使用相应设施时调用)。但也许从性能角度来说这没有任何意义(只是猜测)。

顺便说一句,gdb-7.x 可以使用catch syscall 命令停止系统调用。

uname is needed to check that the kernel version is not too ancient.

Two brks are needed to set up thread local storage. Two others are needed to set up dynamic loader path (the executable still might call dlopen, even if it's statically linked). I'm not sure why these come in pairs.

On system arch_prctl isn't called, set_thread_area is called in its place. This sets up TLS for the current thread.

These things probably could be done lazily (i.e. called when corresponding facilities are used for the first time). But perhaps it would make no sense performance-wise (just a guess).

By the way gdb-7.x can stop on system calls with the catch syscall command.

迷鸟归林 2024-12-14 17:58:30

无耻插件:当针对 musl libc 构建时,该程序静态链接或动态链接的 strace 是:

execve("./a.out", ["./a.out"], [/* 42 vars */]) = 0
write(1, "Hello world!", 12)            = 12
exit_group(0)                           = ?

如果您静态链接,则使用 Dietlibc 应该同样最小,或者只要您使用 uClibc 和静态链接,它应该是同样最小的。构建 uClibc 并禁用语言环境和高级 stdio 内容。 (出于某种原因,启用这些功能的 uClibc 会运行大量启动代码来初始化它们,即使在不使用它们的程序中也是如此......)。然而,据我所知,musl 是唯一一个拥有动态链接器,能够避免动态链接程序中沉重的启动系统调用开销的链接器。

至于为什么与glibc的静态链接会进行所有这些brk调用,我真的不知道;你必须阅读源代码。我怀疑它正在为 malloc、stdio、区域设置以及可能的主线程的线程结构的内部数据结构分配空间。正如nm所说,arch_prctl用于设置线程寄存器以指向主线程的线程结构。这可以推迟到第一次访问(musl 就是这样做的),但这样做有点痛苦,并且会轻微损害性能。如果您更关心大型程序的运行时间而不是许多小程序的启动时间,那么始终在程序加载时初始化线程寄存器可能是有意义的。请注意,内核无法为您设置它,因为它不知道应该设置的地址。

可以对 ELF 格式进行扩展,以允许主线程结构位于 .data 部分,并通过 ELF 标头告诉内核它在哪里,但是 libc 之间需要杂技、链接器和内核可能会非常丑陋,以至于使这种优化变得不可取......它们还将对线程的用户空间实现施加进一步的限制。

Shameless plug: When built against musl libc, the strace for that program static linked or dynamic linked is:

execve("./a.out", ["./a.out"], [/* 42 vars */]) = 0
write(1, "Hello world!", 12)            = 12
exit_group(0)                           = ?

It should be similarly minimal with dietlibc if you static link, or with uClibc and static linking as long as you built uClibc with locale and advanced stdio stuff disabled. (For some reason uClibc with those features enabled runs lots of startup code to initialize them even in programs that don't use them...). As far as I know, however, musl is the only one that has a dynamic linker capable of avoiding heavy startup syscall overhead in dynamic-linked programs.

As for why static linking with glibc makes all those brk calls, I really have no idea; you'd have to read the source. I suspect it's allocating space for internal data structures for malloc, stdio, locale, and possibly the thread structure for the main thread. As n.m. said, the arch_prctl is for setting the thread register to point to the main thread's thread structure. This could be deferred to the first access (which musl does), but it's a bit of a pain to do so and mildly hurts performance. If you care about the runtime of large programs more than the startup time of many many small programs, it may make sense to always initialize the thread register at program load time. Note that the kernel cannot set it for you because it does not know the address it should be set to.

It's possible that an extension to the ELF format could be made to allow the main thread structure to be in the .data section with an ELF header telling the kernel where it is, but the acrobatics needed between the libc, the linker, and the kernel would probably be so ugly as to make this optimization undesirable... They would also impose further constraints on the userspace implementation of threads.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文