为什么 ELF 执行入口点虚拟地址的形式为 0x80xxxxx 而不是零 0x0?

发布于 2024-08-20 04:58:22 字数 343 浏览 11 评论 0原文

执行时,程序将从虚拟地址0x80482c0开始运行。该地址并不指向我们的 main() 过程,而是指向由链接器创建的名为 _start 的过程。

到目前为止,我的谷歌研究只是让我做出了一些(模糊的)历史推测,如下所示:

有民间传说,0x08048000 曾经是 STACK_TOP(即堆栈从 0x08048000 附近向下向 0 增长),在 *NIX 到 i386 的端口上,这是由加利福尼亚州圣克鲁斯的一个组织发布的。当时 128MB RAM 非常昂贵,4GB RAM 更是不可想象。

有人能证实/否认这一点吗?

When executed, program will start running from virtual address 0x80482c0. This address doesn't point to our main() procedure, but to a procedure named _start which is created by the linker.

My Google research so far just led me to some (vague) historical speculations like this:

There is folklore that 0x08048000 once was STACK_TOP (that is, the stack grew downwards from near 0x08048000 towards 0) on a port of *NIX to i386 that was promulgated by a group from Santa Cruz, California. This was when 128MB of RAM was expensive, and 4GB of RAM was unthinkable.

Can anyone confirm/deny this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不如归去 2024-08-27 04:58:22

正如 Mads 指出的,为了捕获大多数通过空指针的访问,类 Unix 系统倾向于使地址为零的页面“未映射”。因此,访问会立即触发 CPU 异常,即段错误。这比让应用程序失控要好得多。然而,异常向量表可以位于任何地址,至少在 x86 处理器上(有一个特殊的寄存器,加载有 lidt 操作码)。

起始点地址是描述内存如何布局的一组约定的一部分。链接器在生成可执行二进制文件时必须知道这些约定,因此它们不太可能更改。基本上,对于 Linux,内存布局约定是从 90 年代初的 Linux 的第一个版本继承的。进程必须能够访问多个区域:

  • 代码必须位于包含起始点的范围内。
  • 一定有一个栈。
  • 必须有一个堆,其限制随着 brk()sbrk() 系统调用而增加。
  • 必须有一些空间用于 mmap() 系统调用,包括共享库加载。

如今,malloc() 所在的堆由 mmap() 调用支持,这些调用在内核认为合适的任何地址获取内存块。但在更早的时代,Linux 就像以前的类 Unix 系统一样,它的堆需要一个不间断的块中的一个大区域,它可以朝着增加地址的方向增长。因此,无论约定是什么,它都必须向低地址填充代码和堆栈,并将给定点之后的每个地址空间块都分配给堆。

但还有堆栈,它通常很小,但在某些情况下可能会急剧增长。堆栈向下增长,当堆栈已满时,我们确实希望进程可预测地崩溃,而不是覆盖某些数据。因此,堆栈必须有一个宽阔的区域,在该区域的低端有一个未映射的页面。瞧!地址零处有一个未映射的页面,用于捕获空指针取消引用。因此,定义堆栈将获取前 128 MB 地址空间(第一页除外)。这意味着代码必须在类似于 0x080xxxxx 的地址处处理这 128 MB。

正如 Michael 指出的那样,“丢失”128 MB 地址空间没什么大不了的,因为相对于实际可用的地址空间而言,地址空间非常大。当时,Linux 内核将单个进程的地址空间限制为 1 GB,超过了硬件允许的最大 4 GB,这不被认为是一个大问题。

As Mads pointed out, in order to catch most accesses through null pointers, Unix-like systems tend to make the page at address zero "unmapped". Thus, accesses immediately trigger a CPU exception, in other words a segfault. This is quite better than letting the application go rogue. The exception vector table, however, can be at any address, at least on x86 processors (there is a special register for that, loaded with the lidt opcode).

The starting point address is part of a set of conventions which describe how memory is laid out. The linker, when it produces an executable binary, must know these conventions, so they are not likely to change. Basically, for Linux, the memory layout conventions are inherited from the very first versions of Linux, in the early 90's. A process must have access to several areas:

  • The code must be in a range which includes the starting point.
  • There must be a stack.
  • There must be a heap, with a limit which is increased with the brk() and sbrk() system calls.
  • There must be some room for mmap() system calls, including shared library loading.

Nowadays, the heap, where malloc() goes, is backed by mmap() calls which obtain chunks of memory at whatever address the kernel sees fit. But in older times, Linux was like previous Unix-like systems, and its heap required a big area in one uninterrupted chunk, which could grow towards increasing addresses. So whatever was the convention, it had to stuff code and stack towards low addresses, and give every chunk of the address space after a given point to the heap.

But there is also the stack, which is usually quite small but could grow quite dramatically in some occasions. The stack grows down, and when the stack is full, we really want the process to predictably crash rather than overwriting some data. So there had to be a wide area for the stack, with, at the low end of that area, an unmapped page. And lo! There is an unmapped page at address zero, to catch null pointer dereferences. Hence it was defined that the stack would get the first 128 MB of address space, except for the first page. This means that the code had to go after those 128 MB, at an address similar to 0x080xxxxx.

As Michael points out, "losing" 128 MB of address space was no big deal because the address space was very large with regards to what could be actually used. At that time, the Linux kernel was limiting the address space for a single process to 1 GB, over a maximum of 4 GB allowed by the hardware, and that was not considered to be a big issue.

倦话 2024-08-27 04:58:22

为什么不从地址0x0开始呢?至少有两个原因:

  • 因为地址零被称为 NULL 指针,并且被编程语言用作健全的检查指针。如果您要在那里执行代码,则不能为此使用地址值。
  • 地址 0 处的实际内容通常(但并非总是)是异常向量表,因此在非特权模式下无法访问。请参阅特定架构的文档。

至于入口点 _startmain
如果您链接到 C 运行时(C 标准库),该库会包装名为 main 的函数,因此它可以在调用 main 之前初始化环境。在 Linux 上,这些是应用程序的 argcargv 参数、env 变量,以及可能的一些同步原语和锁。它还确保从 main 返回传递状态代码,并调用 _exit 函数来终止进程。

Why not start at address 0x0? There's at least two reasons for this:

  • Because address zero is famously known as a NULL pointer, and used by programming languages to sane check pointers. You can't use an address value for that, if you're going to execute code there.
  • The actual contents at address 0 is often (but not always) the exception vector table, and is hence not accessible in non-privileged modes. Consult the documentation of your specific architecture.

As for the entrypoint _start vs main:
If you link against the C runtime (the C standard libraries), the library wraps the function named main, so it can initialize the environment before main is called. On Linux, these are the argc and argv parameters to the application, the env variables, and probably some synchronization primitives and locks. It also makes sure that returning from main passes on the status code, and calls the _exit function, which terminates the process.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文