gcc 和 ld 中与位置无关的可执行文件的 -fPIE 选项是什么?

发布于 2024-08-25 21:52:14 字数 23 浏览 2 评论 0原文

它将如何改变代码,例如函数调用?

How will it change the code, e.g. function calls?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

z祗昰~ 2024-09-01 21:52:14

PIE 将支持可执行文件中的地址空间布局随机化 (ASLR)

在创建PIE模式之前,程序的可执行文件无法放置在内存中的随机地址,只有位置无关代码(PIC)动态库可以重定位到随机偏移量。它的工作原理与 PIC 对动态库的作用非常相似,不同之处在于不创建过程链接表 (PLT),而是使用 PC 相对重定位。

在 gcc/linkers 中启用 PIE 支持后,程序主体将作为位置无关代码进行编译和链接。动态链接器对程序模块进行完全重定位处理,就像动态库一样。全局数据的任何使用都会转换为通过全局偏移表 (GOT) 进行访问,并添加 GOT 重定位。

此 OpenBSD PIE 演示文稿中对 PIE 进行了很好的描述。

此幻灯片显示了函数的更改(PIE 与 PIC)。

x86 图片与饼图

局部全局变量和函数在pie中进行了优化

外部全局变量和函数与图片相同

外部全局变量和函数与pic和这张幻灯片(PIE 与旧式链接)

x86 饼图与无标志(已修复)

局部全局变量和函数与fixed类似

外部全局变量和函数与图片相同

注意,PIE 可能与 -static 不兼容

PIE is to support address space layout randomization (ASLR) in executable files.

Before the PIE mode was created, the program's executable could not be placed at a random address in memory, only position independent code (PIC) dynamic libraries could be relocated to a random offset. It works very much like what PIC does for dynamic libraries, the difference is that a Procedure Linkage Table (PLT) is not created, instead PC-relative relocation is used.

After enabling PIE support in gcc/linkers, the body of program is compiled and linked as position-independent code. A dynamic linker does full relocation processing on the program module, just like dynamic libraries. Any usage of global data is converted to access via the Global Offsets Table (GOT) and GOT relocations are added.

PIE is well described in this OpenBSD PIE presentation.

Changes to functions are shown in this slide (PIE vs PIC).

x86 pic vs pie

Local global variables and functions are optimized in pie

External global variables and functions are same as pic

and in this slide (PIE vs old-style linking)

x86 pie vs no-flags (fixed)

Local global variables and functions are similar to fixed

External global variables and functions are same as pic

Note, that PIE may be incompatible with -static

夏末 2024-09-01 21:52:14

最小可运行示例:GDB 可执行文件两次

对于那些想要查看某些操作的人,让我们看看 ASLR 在 PIE 可执行文件上的工作情况并在运行期间更改地址:

main.c

#include <stdio.h>

int main(void) {
    puts("hello");
}

main.sh

#!/usr/bin/env bash
echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
for pie in no-pie pie; do
  exe="${pie}.out"
  gcc -O0 -std=c99 "-${pie}" "-f${pie}" -ggdb3 -o "$exe" main.c
  gdb -batch -nh \
    -ex 'set disable-randomization off' \
    -ex 'break main' \
    -ex 'run' \
    -ex 'printf "pc = 0x%llx\n", (long  long unsigned)$pc' \
    -ex 'run' \
    -ex 'printf "pc = 0x%llx\n", (long  long unsigned)$pc' \
    "./$exe" \
  ;
  echo
  echo
done

对于带有 -no-pie,一切都很无聊:

Breakpoint 1 at 0x401126: file main.c, line 4.

Breakpoint 1, main () at main.c:4
4           puts("hello");
pc = 0x401126

Breakpoint 1, main () at main.c:4
4           puts("hello");
pc = 0x401126

在开始执行之前,break main0x401126处设置断点。

然后,在两次执行期间,run 都会在地址 0x401126 处停止。

然而,带有 -pie 的地址要有趣得多:

Breakpoint 1 at 0x1139: file main.c, line 4.

Breakpoint 1, main () at main.c:4
4           puts("hello");
pc = 0x5630df2d6139

Breakpoint 1, main () at main.c:4
4           puts("hello");
pc = 0x55763ab2e139

在开始执行之前,GDB 仅采用可执行文件中存在的一个“虚拟”地址:0x1139

然而,启动后,GDB 会智能地注意到动态加载程序将程序放置在不同的位置,并且第一个中断停在 0x5630df2d6139 处。

然后,第二次运行也智能地注意到可执行文件再次移动,并最终在 0x55763ab2e139 处中断。

回声 2 | sudo tee /proc/sys/kernel/randomize_va_space 确保 ASLR 处于开启状态(Ubuntu 17.10 中的默认设置):如何暂时禁用 ASLR(地址空间布局随机化)? |询问 Ubuntu

需要setdisable-randomizationoff,否则GDB,顾名思义,默认关闭进程的ASLR,以便在运行时提供固定地址,以改善调试体验:gdb 地址和“真实”地址之间的差异? |堆栈溢出

readelf分析

此外,我们还可以观察到:

readelf -s ./no-pie.out | grep main

给出了实际的运行时加载地址(pc指向后面4个字节的指令):

64: 0000000000401122    21 FUNC    GLOBAL DEFAULT   13 main

while:

readelf -s ./pie.out | grep main

只给出了一个偏移量

65: 0000000000001135    23 FUNC    GLOBAL DEFAULT   14 main

:关闭 ASLR(使用 randomize_va_spaceset disable-randomization off),GDB 始终为 main 提供地址:0x5555555547a9 >,所以我们推断 -pie 地址由以下内容组成:

0x555555554000 + random offset + symbol offset (79a)

TODO 0x555555554000 硬编码在 Linux 内核 / glibc loader / 哪里? 如何Linux 中确定的 PIE 可执行文件的文本部分的地址?

最小汇编示例

我们可以做的另一件很酷的事情是使用一些汇编代码来更具体地理解 PIE 的含义。

我们可以使用 Linux x86_64 独立式程序集 hello world 来做到这一点:

main.S

.text
.global _start
_start:
asm_main_after_prologue:
    /* write */
    mov $1, %rax   /* syscall number */
    mov $1, %rdi   /* stdout */
    mov $msg, %rsi  /* buffer */
    mov $len, %rdx /* len */
    syscall

    /* exit */
    mov $60, %rax   /* syscall number */
    mov $0, %rdi    /* exit status */
    syscall
msg:
    .ascii "hello\n"
len = . - msg

GitHub 上游

并且它可以正常组装和运行:

as -o main.o main.S
ld -o main.out main.o
./main.out

但是,如果我们尝试使用 (--no-dynamic-linker< /code> 是必需的,如下所述: 如何在 Linux 中创建静态链接位置无关的可执行 ELF?):

ld --no-dynamic-linker -pie -o main.out main.o

则链接将失败并显示:

ld: main.o: relocation R_X86_64_32S against `.text' can not be used when making a PIE object; recompile with -fPIC
ld: final link failed: nonrepresentable section on output

因为该行:

mov $msg, %rsi  /* buffer */

硬编码了 mov 操作数中的消息地址,因此不是立场独立。

如果我们以位置无关的方式编写它:

lea msg(%rip), %rsi

那么 PIE 链接工作正常,GDB 向我们显示可执行文件每次都会加载到内存中的不同位置。

这里的区别在于,由于 rip 语法,lea 对相对于当前 PC 地址的 msg 地址进行了编码,另请参阅:如何在 64 位中使用 RIP 相对寻址汇编程序?

我们还可以通过反汇编两个版本来弄清楚:

objdump -S main.o

分别给出:

e:   48 c7 c6 00 00 00 00    mov    $0x0,%rsi
e:   48 8d 35 19 00 00 00    lea    0x19(%rip),%rsi        # 2e <msg>

000000000000002e <msg>:
  2e:   68 65 6c 6c 6f          pushq  $0x6f6c6c65

所以我们清楚地看到lea已经有了msg 编码为当前地址 + 0x19。

然而 mov 版本已将地址设置为 00 00 00 00,这意味着将在那里执行重定位:链接器做什么? ld 错误消息中神秘的 R_X86_64_32S 是所需的实际重定位类型,并且在 PIE 可执行文件中无法发生。

我们可以做的另一件有趣的事情是将 msg 放在数据部分而不是 .text 中:

.data
msg:
    .ascii "hello\n"
len = . - msg

现在 .o 汇编为:

e:   48 8d 35 00 00 00 00    lea    0x0(%rip),%rsi        # 15 <_start+0x15>

因此 RIP 偏移量现在为 0,我们猜测汇编程序已请求重定位。我们确认 with:

readelf -r main.o

给出:

Relocation section '.rela.text' at offset 0x160 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000011  000200000002 R_X86_64_PC32     0000000000000000 .data - 4

很明显,R_X86_64_PC32 是一个 PC 相对重定位,ld 可以处理 PIE 可执行文件。

这个实验告诉我们,链接器本身会检查程序是否可以是 PIE 并将其标记为 PIE。

然后,当使用 GCC 编译时,-pie 告诉 GCC 生成与位置无关的汇编。

但如果我们自己编写汇编,我们必须手动确保我们已经实现了位置独立。

在 ARMv8 aarch64 中,位置无关的 hello world 可以通过 ADR 指令

如何确定 ELF 是否与位置无关?

除了通过 GDB 运行它之外,还提到了一些静态方法:

在 Ubuntu 18.10 中测试。

Minimal runnable example: GDB the executable twice

For those that want to see some action, let's see ASLR work on the PIE executable and change addresses across runs:

main.c

#include <stdio.h>

int main(void) {
    puts("hello");
}

main.sh

#!/usr/bin/env bash
echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
for pie in no-pie pie; do
  exe="${pie}.out"
  gcc -O0 -std=c99 "-${pie}" "-f${pie}" -ggdb3 -o "$exe" main.c
  gdb -batch -nh \
    -ex 'set disable-randomization off' \
    -ex 'break main' \
    -ex 'run' \
    -ex 'printf "pc = 0x%llx\n", (long  long unsigned)$pc' \
    -ex 'run' \
    -ex 'printf "pc = 0x%llx\n", (long  long unsigned)$pc' \
    "./$exe" \
  ;
  echo
  echo
done

For the one with -no-pie, everything is boring:

Breakpoint 1 at 0x401126: file main.c, line 4.

Breakpoint 1, main () at main.c:4
4           puts("hello");
pc = 0x401126

Breakpoint 1, main () at main.c:4
4           puts("hello");
pc = 0x401126

Before starting execution, break main sets a breakpoint at 0x401126.

Then, during both executions, run stops at address 0x401126.

The one with -pie however is much more interesting:

Breakpoint 1 at 0x1139: file main.c, line 4.

Breakpoint 1, main () at main.c:4
4           puts("hello");
pc = 0x5630df2d6139

Breakpoint 1, main () at main.c:4
4           puts("hello");
pc = 0x55763ab2e139

Before starting execution, GDB just takes a "dummy" address that is present in the executable: 0x1139.

After it starts however, GDB intelligently notices that the dynamic loader placed the program in a different location, and the first break stopped at 0x5630df2d6139.

Then, the second run also intelligently noticed that the executable moved again, and ended up breaking at 0x55763ab2e139.

echo 2 | sudo tee /proc/sys/kernel/randomize_va_space ensures that ASLR is on (the default in Ubuntu 17.10): How can I temporarily disable ASLR (Address space layout randomization)? | Ask Ubuntu.

set disable-randomization off is needed otherwise GDB, as the name suggests, turns off ASLR for the process by default to give fixed addresses across runs to improve the debugging experience: Difference between gdb addresses and "real" addresses? | Stack Overflow.

readelf analysis

Furthermore, we can also observe that:

readelf -s ./no-pie.out | grep main

gives the actual runtime load address (pc pointed to the following instruction 4 bytes after):

64: 0000000000401122    21 FUNC    GLOBAL DEFAULT   13 main

while:

readelf -s ./pie.out | grep main

gives just an offset:

65: 0000000000001135    23 FUNC    GLOBAL DEFAULT   14 main

By turning ASLR off (with either randomize_va_space or set disable-randomization off), GDB always gives main the address: 0x5555555547a9, so we deduce that the -pie address is composed from:

0x555555554000 + random offset + symbol offset (79a)

TODO where is 0x555555554000 hard coded in the Linux kernel / glibc loader / wherever? How is the address of the text section of a PIE executable determined in Linux?

Minimal assembly example

Another cool thing we can do is to play around with some assembly code to understand more concretely what PIE means.

We can do that with a Linux x86_64 freestanding assembly hello world:

main.S

.text
.global _start
_start:
asm_main_after_prologue:
    /* write */
    mov $1, %rax   /* syscall number */
    mov $1, %rdi   /* stdout */
    mov $msg, %rsi  /* buffer */
    mov $len, %rdx /* len */
    syscall

    /* exit */
    mov $60, %rax   /* syscall number */
    mov $0, %rdi    /* exit status */
    syscall
msg:
    .ascii "hello\n"
len = . - msg

GitHub upstream

and it assembles and runs fine with:

as -o main.o main.S
ld -o main.out main.o
./main.out

However, if we try to link it as PIE with (--no-dynamic-linker is required as explained at: How to create a statically linked position independent executable ELF in Linux?):

ld --no-dynamic-linker -pie -o main.out main.o

then link will fail with:

ld: main.o: relocation R_X86_64_32S against `.text' can not be used when making a PIE object; recompile with -fPIC
ld: final link failed: nonrepresentable section on output

Because the line:

mov $msg, %rsi  /* buffer */

hardcodes the message address in the mov operand, and is therefore not position independent.

If we instead write it in a position independent way:

lea msg(%rip), %rsi

then PIE link works fine, and GDB shows us that the executable does get loaded at a different location in memory every time.

The difference here is that lea encoded the address of msg relative to the current PC address due to the rip syntax, see also: How to use RIP Relative Addressing in a 64-bit assembly program?

We can also figure that out by disassembling both versions with:

objdump -S main.o

which give respectively:

e:   48 c7 c6 00 00 00 00    mov    $0x0,%rsi
e:   48 8d 35 19 00 00 00    lea    0x19(%rip),%rsi        # 2e <msg>

000000000000002e <msg>:
  2e:   68 65 6c 6c 6f          pushq  $0x6f6c6c65

So we see clearly that lea already has the full correct address of msg encoded as current address + 0x19.

The mov version however has set the address to 00 00 00 00, which means that a relocation will be performed there: What do linkers do? The cryptic R_X86_64_32S in the ld error message is the actual type of relocation that was required and which cannot happen in PIE executables.

Another fun thing that we can do is to put the msg in the data section instead of .text with:

.data
msg:
    .ascii "hello\n"
len = . - msg

Now the .o assembles to:

e:   48 8d 35 00 00 00 00    lea    0x0(%rip),%rsi        # 15 <_start+0x15>

so the RIP offset is now 0, and we guess that a relocation has been requested by the assembler. We confirm that with:

readelf -r main.o

which gives:

Relocation section '.rela.text' at offset 0x160 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000011  000200000002 R_X86_64_PC32     0000000000000000 .data - 4

so clearly R_X86_64_PC32 is a PC relative relocation that ld can handle for PIE executables.

This experiment taught us that the linker itself checks the program can be PIE and marks it as such.

Then when compiling with GCC, -pie tells GCC to generate position independent assembly.

But if we write assembly ourselves, we must manually ensure that we have achieved position independence.

In ARMv8 aarch64, the position independent hello world can be achieved with the ADR instruction.

How to determine if an ELF is position independent?

Besides just running it through GDB, some static methods are mentioned at:

Tested in Ubuntu 18.10.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文