将ELF文件的入口点计算为物理地址(从0开始的偏移量)
我正在构建一个 RISC-V 模拟器,它基本上将整个 ELF 文件加载到内存中。
到目前为止,我使用了 risc-v 基金会提供的预编译测试二进制文件,它方便地在 .text
部分的开头有一个入口点。
例如:
> riscv32-unknown-elf-objdump ../riscv32i-emulator/tests/simple -d
../riscv32i-emulator/tests/simple: file format elf32-littleriscv
Disassembly of section .text.init:
80000000 <_start>:
80000000: 0480006f j 80000048 <reset_vector>
...
进入这个项目时,我对 ELF 文件了解不多,所以我只是假设每个 ELF 的入口点与 .text
部分的开头完全相同。
当我编译自己的二进制文件时出现了问题,我发现实际的入口点并不总是与 .text
部分的开头相同,但它可能位于其中的任何位置,如下所示:
> riscv32-unknown-elf-objdump a.out -d
a.out: file format elf32-littleriscv
Disassembly of section .text:
00010074 <register_fini>:
10074: 00000793 li a5,0
10078: 00078863 beqz a5,10088 <register_fini+0x14>
1007c: 00010537 lui a0,0x10
10080: 43850513 addi a0,a0,1080 # 10438 <__libc_fini_array>
10084: 3a00006f j 10424 <atexit>
10088: 00008067 ret
0001008c <_start>:
1008c: 00002197 auipc gp,0x2
10090: cec18193 addi gp,gp,-788 # 11d78 <__global_pointer$>
...
因此,在阅读了有关 ELF 文件的更多信息后,我发现实际的入口点地址是由 ELF 标头上的 Entry
条目提供的:
> riscv32-unknown-elf-readelf a.out -h | grep Entry
Entry point address: 0x1008c
现在的问题是,这个地址不是 ELF 文件上的实际地址。文件(从 0 开始偏移)但是是一个虚拟文件地址,所以显然如果我将模拟器的程序计数器设置为该地址,模拟器就会崩溃。
多读一点,我听到人们谈论有关程序头偏移量等的计算,但没有人给出具体的答案。
我的问题是:如何准确获取 _start
过程的入口点地址作为距字节 0 的偏移量的实际“公式”是什么?
需要明确的是,我的模拟器不支持虚拟内存,并且二进制文件是加载到模拟器内存中的唯一内容,因此我没有使用虚拟内存的抽象。我只想将每个内存地址作为磁盘上的物理地址。
I am building a RISC-V emulator which basically loads a whole ELF file into memory.
Up to now, I used the pre-compiled test binaries that the risc-v foundation provided which conveniently had an entry point exactly at the start of the .text
section.
For example:
> riscv32-unknown-elf-objdump ../riscv32i-emulator/tests/simple -d
../riscv32i-emulator/tests/simple: file format elf32-littleriscv
Disassembly of section .text.init:
80000000 <_start>:
80000000: 0480006f j 80000048 <reset_vector>
...
Going into this project I didn't know much about ELF files so I just assumed that every ELF's entry point is exactly the same as the start of the .text
section.
The problem arose when I compiled my own binaries, I found out that the actual entry point is not always the same as the start of the .text
section, but it might be anywhere inside it, like here:
> riscv32-unknown-elf-objdump a.out -d
a.out: file format elf32-littleriscv
Disassembly of section .text:
00010074 <register_fini>:
10074: 00000793 li a5,0
10078: 00078863 beqz a5,10088 <register_fini+0x14>
1007c: 00010537 lui a0,0x10
10080: 43850513 addi a0,a0,1080 # 10438 <__libc_fini_array>
10084: 3a00006f j 10424 <atexit>
10088: 00008067 ret
0001008c <_start>:
1008c: 00002197 auipc gp,0x2
10090: cec18193 addi gp,gp,-788 # 11d78 <__global_pointergt;
...
So, after reading more about ELF files, I found out that the actual entry point address is provided by the Entry
entry on the ELF's header:
> riscv32-unknown-elf-readelf a.out -h | grep Entry
Entry point address: 0x1008c
The problem now becomes that this address is not the actual address on the file (offset from 0) but is a virtual address, so obviously if I set the program counter of my emulator to this address, the emulator would crash.
Reading a bit more, I heard people talk about calculations regarding offsets from program headers and whatnot, but no one had a concrete answer.
My question is: what is the actual "formula" of how exactly you get the entry point address of the _start
procedure as an offset from byte 0?
Just to be clear my emulator doesn't support virtual memory and the binary is the only thing that is loaded into my emulator's memory, so I have no use for the abstraction of virtual memory. I just want every memory address as physical address on disk.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,忘记部分。只有段在运行时有意义。
其次,使用 readelf -Wl 查看段。它们准确地告诉您哪个文件块 (
[.p_offset, .p_offset + .p_filesz)
) 进入哪个内存区域 ([.p_vaddr, .p_vaddr + .p_memsz)
代码>)。“_start 位于文件中的哪个偏移量”的精确计算是:
phdr
,_start
的文件偏移量为:ehdr->e_entry - phdr->p_vaddr + phdr->p_offset
。更新:
不。
不。
您正在寻找与内存中的
ehdr->e_entry
重叠的程序头(描述内存中数据和文件数据之间的关系)。也就是说,您正在查找phdr->p_vaddr <= ehdr->e_entry && 的段。 ehdr->e_entry < phdr->p_vaddr + phdr->p_memsz
。该段通常是第一个,但这并不能保证。另请参阅此答案。First, forget about sections. Only segments matter at runtime.
Second, use
readelf -Wl
to look at segments. They tell you exactly which chunk of file ([.p_offset, .p_offset + .p_filesz)
) goes into which in-memory region ([.p_vaddr, .p_vaddr + .p_memsz)
).The exact calculation of "at which offset in the file does
_start
reside" is:Elf32_Phdr
which "covers" the address contained inElf32_Ehdr.e_entry
.phdr
, file offset of_start
is:ehdr->e_entry - phdr->p_vaddr + phdr->p_offset
.Update:
No.
No.
You are looking for a the program header (describing relationship between in-memory and on-file data) which overlaps the
ehdr->e_entry
in memory. That is, you are looking for the segment for whichphdr->p_vaddr <= ehdr->e_entry && ehdr->e_entry < phdr->p_vaddr + phdr->p_memsz
. This segment is often the first, but that is in no way guaranteed. See also this answer.