为什么 ELF 可执行文件可以有 4 个 LOAD 段？

发布于 2025-01-09 21:35:49 字数 1661 浏览 1 评论 0原文

有一个远程 64 位 *nix 服务器可以编译用户提供的代码（应该用 Rust 编写，但我认为这并不重要，因为它使用 LLVM）。我不知道它使用哪个编译器/链接器标志，但是编译后的 ELF 可执行文件看起来很奇怪 - 它有 4 个 LOAD 段：

$ readelf -e executable
...
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
...
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000004138 0x0000000000004138  R      0x1000
  LOAD           0x0000000000005000 0x0000000000005000 0x0000000000005000
                 0x00000000000305e9 0x00000000000305e9  R E    0x1000
  LOAD           0x0000000000036000 0x0000000000036000 0x0000000000036000
                 0x000000000000d808 0x000000000000d808  R      0x1000
  LOAD           0x0000000000043da0 0x0000000000044da0 0x0000000000044da0
                 0x0000000000002290 0x00000000000024a0  RW     0x1000
...

在我自己的系统上，我正在查看的所有可执行文件只有 2 个 LOAD 段：

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
...
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000003000c0 0x00000000003000c0  R E    0x200000
  LOAD           0x00000000003002b0 0x00000000005002b0 0x00000000005002b0
                 0x00000000000776c8 0x000000000009b200  RW     0x200000
...

什么情况（编译器/链接器版本、标志等），编译器可以在其下构建具有 4 个 LOAD 段的 ELF？
有 4 个 LOAD 段有什么意义？我想拥有一个具有读取但不执行权限的段可能有助于防止某些攻击，但为什么要有两个这样的段呢？

原文

There is a remote 64-bit *nix server that can compile a user-provided code (which should be written in Rust, but I don't think it matters since it uses LLVM). I don't know which compiler/linker flags it uses, but the compiled ELF executable looks weird - it has 4 LOAD segments:

$ readelf -e executable
...
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
...
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000004138 0x0000000000004138  R      0x1000
  LOAD           0x0000000000005000 0x0000000000005000 0x0000000000005000
                 0x00000000000305e9 0x00000000000305e9  R E    0x1000
  LOAD           0x0000000000036000 0x0000000000036000 0x0000000000036000
                 0x000000000000d808 0x000000000000d808  R      0x1000
  LOAD           0x0000000000043da0 0x0000000000044da0 0x0000000000044da0
                 0x0000000000002290 0x00000000000024a0  RW     0x1000
...

On my own system all executables that I was looking at only have 2 LOAD segments:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
...
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x00000000003000c0 0x00000000003000c0  R E    0x200000
  LOAD           0x00000000003002b0 0x00000000005002b0 0x00000000005002b0
                 0x00000000000776c8 0x000000000009b200  RW     0x200000
...

What are the circumstances (compiler/linker versions, flags etc) under which a compiler might build an ELF with 4 LOAD segments?
What is the point of having 4 LOAD segments? I imagine that having a segment with read but not execute permission might help against certain exploits, but why have two such segments?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

世态炎凉 2025-01-16 21:35:49

典型的 BFD-ld 或 Gold 链接的 Linux 可执行文件有 2 个可加载段，其中 ELF 标头与 .text 和 .rodata 合并到第一个段中RE 段，以及.data、.bss 等可写段合并到第二个RW 段中。

这是典型的段到段映射：

$ echo "int foo; int main() { return 0;}"  | clang -xc - -o a.out-gold -fuse-ld=gold
$ readelf -Wl a.out-gold

Elf file type is EXEC (Executable file)
Entry point 0x400420
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R   0x8
  INTERP         0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0006b0 0x0006b0 R E 0x1000
  LOAD           0x000e18 0x0000000000401e18 0x0000000000401e18 0x0001f8 0x000200 RW  0x1000
  DYNAMIC        0x000e28 0x0000000000401e28 0x0000000000401e28 0x0001b0 0x0001b0 RW  0x8
  NOTE           0x000254 0x0000000000400254 0x0000000000400254 0x000020 0x000020 R   0x4
  GNU_EH_FRAME   0x00067c 0x000000000040067c 0x000000000040067c 0x000034 0x000034 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x000e18 0x0000000000401e18 0x0000000000401e18 0x0001e8 0x0001e8 RW  0x8

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .dynsym .dynstr .gnu.hash .hash .gnu.version .gnu.version_r .rela.dyn .init .text .fini .rodata .eh_frame .eh_frame_hdr
   03     .fini_array .init_array .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag
   06     .eh_frame_hdr
   07
   08     .fini_array .init_array .dynamic .got .got.plt

这优化了内核加载此类可执行文件必须执行的 mmap 数量，但会付出安全代价：.rodata 中的数据> 不应该是可执行的，但却是（因为它与 .text 合并，后者必须是可执行的）。这可能会显着增加试图劫持进程的人的攻击面。

较新的 Linux 系统，特别是使用 LLD 链接二进制文件，优先考虑安全性而不是速度，并将 ELF 标头和 .rodata 放入第一个 R -仅分段，产生 3 个负载分段并提高安全性。下面是一个典型的映射：

$ echo "int foo; int main() { return 0;}"  | clang -xc - -o a.out-lld -fuse-ld=lld
$ readelf -Wl a.out-lld

Elf file type is EXEC (Executable file)
Entry point 0x201000
There are 10 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x000230 0x000230 R   0x8
  INTERP         0x000270 0x0000000000200270 0x0000000000200270 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x000558 0x000558 R   0x1000
  LOAD           0x001000 0x0000000000201000 0x0000000000201000 0x000185 0x000185 R E 0x1000
  LOAD           0x002000 0x0000000000202000 0x0000000000202000 0x001170 0x002005 RW  0x1000
  DYNAMIC        0x003010 0x0000000000203010 0x0000000000203010 0x000150 0x000150 RW  0x8
  GNU_RELRO      0x003000 0x0000000000203000 0x0000000000203000 0x000170 0x001000 R   0x1
  GNU_EH_FRAME   0x000440 0x0000000000200440 0x0000000000200440 0x000034 0x000034 R   0x1
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x00028c 0x000000000020028c 0x000000000020028c 0x000020 0x000020 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .rodata .dynsym .gnu.version .gnu.version_r .gnu.hash .hash .dynstr .rela.dyn .eh_frame_hdr .eh_frame
   03     .text .init .fini
   04     .data .tm_clone_table .fini_array .init_array .dynamic .got .bss
   05     .dynamic
   06     .fini_array .init_array .dynamic .got
   07     .eh_frame_hdr
   08
   09     .note.ABI-tag

为了不被抛在后面，较新的 BFD-ld（我的版本是 2.31.1）还使 ELF 标头和 .rodata 变为只读，但无法将两个仅限 R 的段合并为一个，从而产生 4 个可加载段：

$ echo "int foo; int main() { return 0;}"  | clang -xc - -o a.out-bfd -fuse-ld=bfd
$ readelf -Wl a.out-bfd

Elf file type is EXEC (Executable file)
Entry point 0x401020
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000400040 0x0000000000400040 0x000268 0x000268 R   0x8
  INTERP         0x0002a8 0x00000000004002a8 0x00000000004002a8 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0003f8 0x0003f8 R   0x1000
  LOAD           0x001000 0x0000000000401000 0x0000000000401000 0x00018d 0x00018d R E 0x1000
  LOAD           0x002000 0x0000000000402000 0x0000000000402000 0x000110 0x000110 R   0x1000
  LOAD           0x002e40 0x0000000000403e40 0x0000000000403e40 0x0001e8 0x0001f0 RW  0x1000
  DYNAMIC        0x002e50 0x0000000000403e50 0x0000000000403e50 0x0001a0 0x0001a0 RW  0x8
  NOTE           0x0002c4 0x00000000004002c4 0x00000000004002c4 0x000020 0x000020 R   0x4
  GNU_EH_FRAME   0x002004 0x0000000000402004 0x0000000000402004 0x000034 0x000034 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x002e40 0x0000000000403e40 0x0000000000403e40 0x0001c0 0x0001c0 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn
   03     .init .text .fini
   04     .rodata .eh_frame_hdr .eh_frame
   05     .init_array .fini_array .dynamic .got .got.plt .data .bss
   06     .dynamic
   07     .note.ABI-tag
   08     .eh_frame_hdr
   09
   10     .init_array .fini_array .dynamic .got

最后，其中一些选择受到 --(no)rosegment （或-Wl,z,noseparate-code 用于 BFD ld) 链接器选项。

A typical BFD-ld or Gold linked Linux executable has 2 loadable segments, with the ELF header merged with .text and .rodata into the first RE segment, and .data, .bss and other writable sections merged into the second RW segment.

Here is the typical section to segment mapping:

$ echo "int foo; int main() { return 0;}"  | clang -xc - -o a.out-gold -fuse-ld=gold
$ readelf -Wl a.out-gold

Elf file type is EXEC (Executable file)
Entry point 0x400420
There are 9 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R   0x8
  INTERP         0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0006b0 0x0006b0 R E 0x1000
  LOAD           0x000e18 0x0000000000401e18 0x0000000000401e18 0x0001f8 0x000200 RW  0x1000
  DYNAMIC        0x000e28 0x0000000000401e28 0x0000000000401e28 0x0001b0 0x0001b0 RW  0x8
  NOTE           0x000254 0x0000000000400254 0x0000000000400254 0x000020 0x000020 R   0x4
  GNU_EH_FRAME   0x00067c 0x000000000040067c 0x000000000040067c 0x000034 0x000034 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x000e18 0x0000000000401e18 0x0000000000401e18 0x0001e8 0x0001e8 RW  0x8

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .dynsym .dynstr .gnu.hash .hash .gnu.version .gnu.version_r .rela.dyn .init .text .fini .rodata .eh_frame .eh_frame_hdr
   03     .fini_array .init_array .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag
   06     .eh_frame_hdr
   07
   08     .fini_array .init_array .dynamic .got .got.plt

This optimizes the number of mmaps that the kernel must perform to load such executable, but at a security cost: the data in .rodata shouldn't be executable, but is (because it's merged with .text, which must be executable). This may significantly increase the attack surface for someone trying to hijack a process.

Newer Linux systems, in particular using LLD to link binaries, prioritize security over speed, and put ELF header and .rodata into the first R-only segment, resulting in 3 load segments and improved security. Here is a typical mapping:

$ echo "int foo; int main() { return 0;}"  | clang -xc - -o a.out-lld -fuse-ld=lld
$ readelf -Wl a.out-lld

Elf file type is EXEC (Executable file)
Entry point 0x201000
There are 10 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x000230 0x000230 R   0x8
  INTERP         0x000270 0x0000000000200270 0x0000000000200270 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x000558 0x000558 R   0x1000
  LOAD           0x001000 0x0000000000201000 0x0000000000201000 0x000185 0x000185 R E 0x1000
  LOAD           0x002000 0x0000000000202000 0x0000000000202000 0x001170 0x002005 RW  0x1000
  DYNAMIC        0x003010 0x0000000000203010 0x0000000000203010 0x000150 0x000150 RW  0x8
  GNU_RELRO      0x003000 0x0000000000203000 0x0000000000203000 0x000170 0x001000 R   0x1
  GNU_EH_FRAME   0x000440 0x0000000000200440 0x0000000000200440 0x000034 0x000034 R   0x1
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0
  NOTE           0x00028c 0x000000000020028c 0x000000000020028c 0x000020 0x000020 R   0x4

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .rodata .dynsym .gnu.version .gnu.version_r .gnu.hash .hash .dynstr .rela.dyn .eh_frame_hdr .eh_frame
   03     .text .init .fini
   04     .data .tm_clone_table .fini_array .init_array .dynamic .got .bss
   05     .dynamic
   06     .fini_array .init_array .dynamic .got
   07     .eh_frame_hdr
   08
   09     .note.ABI-tag

Not to be left behind, the newer BFD-ld (my version is 2.31.1) also makes ELF header and .rodata read-only, but fails to merge two R-only segments into one, resulting in 4 loadable segments:

$ echo "int foo; int main() { return 0;}"  | clang -xc - -o a.out-bfd -fuse-ld=bfd
$ readelf -Wl a.out-bfd

Elf file type is EXEC (Executable file)
Entry point 0x401020
There are 11 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr           FileSiz  MemSiz   Flg Align
  PHDR           0x000040 0x0000000000400040 0x0000000000400040 0x000268 0x000268 R   0x8
  INTERP         0x0002a8 0x00000000004002a8 0x00000000004002a8 0x00001c 0x00001c R   0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x000000 0x0000000000400000 0x0000000000400000 0x0003f8 0x0003f8 R   0x1000
  LOAD           0x001000 0x0000000000401000 0x0000000000401000 0x00018d 0x00018d R E 0x1000
  LOAD           0x002000 0x0000000000402000 0x0000000000402000 0x000110 0x000110 R   0x1000
  LOAD           0x002e40 0x0000000000403e40 0x0000000000403e40 0x0001e8 0x0001f0 RW  0x1000
  DYNAMIC        0x002e50 0x0000000000403e50 0x0000000000403e50 0x0001a0 0x0001a0 RW  0x8
  NOTE           0x0002c4 0x00000000004002c4 0x00000000004002c4 0x000020 0x000020 R   0x4
  GNU_EH_FRAME   0x002004 0x0000000000402004 0x0000000000402004 0x000034 0x000034 R   0x4
  GNU_STACK      0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW  0x10
  GNU_RELRO      0x002e40 0x0000000000403e40 0x0000000000403e40 0x0001c0 0x0001c0 R   0x1

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn
   03     .init .text .fini
   04     .rodata .eh_frame_hdr .eh_frame
   05     .init_array .fini_array .dynamic .got .got.plt .data .bss
   06     .dynamic
   07     .note.ABI-tag
   08     .eh_frame_hdr
   09
   10     .init_array .fini_array .dynamic .got

Finally, some of these choices are affected by the --(no)rosegment (or -Wl,z,noseparate-code for BFD ld) linker option.

回复收藏 0 原文

~没有更多了~