将ELF文件的入口点计算为物理地址(从0开始的偏移量)

发布于 2025-01-12 07:56:55 字数 1867 浏览 5 评论 0原文

我正在构建一个 RISC-V 模拟器,它基本上将整个 ELF 文件加载到内存中。

到目前为止,我使用了 risc-v 基金会提供的预编译测试二进制文件,它方便地在 .text 部分的开头有一个入口点。

例如:

> riscv32-unknown-elf-objdump ../riscv32i-emulator/tests/simple -d

../riscv32i-emulator/tests/simple:     file format elf32-littleriscv


Disassembly of section .text.init:

80000000 <_start>:
80000000:       0480006f                j       80000048 <reset_vector>
...

进入这个项目时,我对 ELF 文件了解不多,所以我只是假设每个 ELF 的入口点与 .text 部分的开头完全相同。

当我编译自己的二进制文件时出现了问题,我发现实际的入口点并不总是与 .text 部分的开头相同,但它可能位于其中的任何位置,如下所示:

> riscv32-unknown-elf-objdump a.out -d

a.out:     file format elf32-littleriscv


Disassembly of section .text:

00010074 <register_fini>:
   10074:       00000793                li      a5,0
   10078:       00078863                beqz    a5,10088 <register_fini+0x14>
   1007c:       00010537                lui     a0,0x10
   10080:       43850513                addi    a0,a0,1080 # 10438 <__libc_fini_array>
   10084:       3a00006f                j       10424 <atexit>
   10088:       00008067                ret

0001008c <_start>:
   1008c:       00002197                auipc   gp,0x2
   10090:       cec18193                addi    gp,gp,-788 # 11d78 <__global_pointer$>
...

因此,在阅读了有关 ELF 文件的更多信息后,我发现实际的入口点地址是由 ELF 标头上的 Entry 条目提供的:

> riscv32-unknown-elf-readelf a.out -h | grep Entry
  Entry point address:               0x1008c

现在的问题是,这个地址不是 ELF 文件上的实际地址。文件(从 0 开始偏移)但是是一个虚拟文件地址,所以显然如果我将模拟器的程序计数器设置为该地址,模拟器就会崩溃。

多读一点,我听到人们谈论有关程序头偏移量等的计算,但没有人给出具体的答案。

我的问题是:如何准确获取 _start 过程的入口点地址作为距字节 0 的偏移量的实际“公式”是什么?

需要明确的是,我的模拟器不支持虚拟内存,并且二进制文件是加载到模拟器内存​​中的唯一内容,因此我没有使用虚拟内存的抽象。我只想将每个内存地址作为磁盘上的物理地址。

I am building a RISC-V emulator which basically loads a whole ELF file into memory.

Up to now, I used the pre-compiled test binaries that the risc-v foundation provided which conveniently had an entry point exactly at the start of the .text section.

For example:

> riscv32-unknown-elf-objdump ../riscv32i-emulator/tests/simple -d

../riscv32i-emulator/tests/simple:     file format elf32-littleriscv


Disassembly of section .text.init:

80000000 <_start>:
80000000:       0480006f                j       80000048 <reset_vector>
...

Going into this project I didn't know much about ELF files so I just assumed that every ELF's entry point is exactly the same as the start of the .text section.

The problem arose when I compiled my own binaries, I found out that the actual entry point is not always the same as the start of the .text section, but it might be anywhere inside it, like here:

> riscv32-unknown-elf-objdump a.out -d

a.out:     file format elf32-littleriscv


Disassembly of section .text:

00010074 <register_fini>:
   10074:       00000793                li      a5,0
   10078:       00078863                beqz    a5,10088 <register_fini+0x14>
   1007c:       00010537                lui     a0,0x10
   10080:       43850513                addi    a0,a0,1080 # 10438 <__libc_fini_array>
   10084:       3a00006f                j       10424 <atexit>
   10088:       00008067                ret

0001008c <_start>:
   1008c:       00002197                auipc   gp,0x2
   10090:       cec18193                addi    gp,gp,-788 # 11d78 <__global_pointer
gt;
...

So, after reading more about ELF files, I found out that the actual entry point address is provided by the Entry entry on the ELF's header:

> riscv32-unknown-elf-readelf a.out -h | grep Entry
  Entry point address:               0x1008c

The problem now becomes that this address is not the actual address on the file (offset from 0) but is a virtual address, so obviously if I set the program counter of my emulator to this address, the emulator would crash.

Reading a bit more, I heard people talk about calculations regarding offsets from program headers and whatnot, but no one had a concrete answer.

My question is: what is the actual "formula" of how exactly you get the entry point address of the _start procedure as an offset from byte 0?

Just to be clear my emulator doesn't support virtual memory and the binary is the only thing that is loaded into my emulator's memory, so I have no use for the abstraction of virtual memory. I just want every memory address as physical address on disk.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

硬不硬你别怂 2025-01-19 07:56:55

我的问题是:如何准确获取 _start 过程的入口点地址作为距字节 0 的偏移量的实际“公式”是什么?

首先,忘记部分。只有在运行时有意义

其次,使用 readelf -Wl 查看段。它们准确地告诉您哪个文件块 ([.p_offset, .p_offset + .p_filesz)) 进入哪个内存区域 ([.p_vaddr, .p_vaddr + .p_memsz)代码>)。

“_start 位于文件中的哪个偏移量”的精确计算是:

  1. 查找“覆盖”Elf32_Ehdr.e_entryElf32_Phdr >。
  2. 使用该 phdr_start 的文件偏移量为:ehdr->e_entry - phdr->p_vaddr + phdr->p_offset

更新:

那么,我总是在寻找第一个程序头吗?

不。

另外,“覆盖”是指第一个 phdr->p_vaddr 始终等于 e_entry?

不。

您正在寻找与内存中的 ehdr->e_entry 重叠的程序头(描述内存中数据和文件数据之间的关系)。也就是说,您正在查找 phdr->p_vaddr <= ehdr->e_entry && 的段。 ehdr->e_entry < phdr->p_vaddr + phdr->p_memsz。该段通常是第一个,但这并不能保证。另请参阅此答案

My question is: what is the actual "formula" of how exactly you get the entry point address of the _start procedure as an offset from byte 0?

First, forget about sections. Only segments matter at runtime.

Second, use readelf -Wl to look at segments. They tell you exactly which chunk of file ([.p_offset, .p_offset + .p_filesz)) goes into which in-memory region ([.p_vaddr, .p_vaddr + .p_memsz)).

The exact calculation of "at which offset in the file does _start reside" is:

  1. Find Elf32_Phdr which "covers" the address contained in Elf32_Ehdr.e_entry.
  2. Using that phdr, file offset of _start is: ehdr->e_entry - phdr->p_vaddr + phdr->p_offset.

Update:

So, am I always looking for the 1st program header?

No.

Also by "covers" you mean that the 1st phdr->p_vaddr is always equal to e_entry?

No.

You are looking for a the program header (describing relationship between in-memory and on-file data) which overlaps the ehdr->e_entry in memory. That is, you are looking for the segment for which phdr->p_vaddr <= ehdr->e_entry && ehdr->e_entry < phdr->p_vaddr + phdr->p_memsz. This segment is often the first, but that is in no way guaranteed. See also this answer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文