可以手动定义全局偏移表吗？

发布于 2025-01-28 16:43:42 字数 4164 浏览 2 评论 0原文

我正在尝试使用以下C ++代码构建平面32位PIC二进制文件：

extern "C" {
void print(const char *){}

void entry_func() {
  print("abcd\n");
}
}

为print生成的汇编（“ ABCD \ n”） lit是：

        calll   .L1$pb
.L1$pb:
        popl    %ebx
.Ltmp3:
        addl    $_GLOBAL_OFFSET_TABLE_+(.Ltmp3-.L1$pb), %ebx
        leal    .L.str@GOTOFF(%ebx), %eax
        movl    %eax, (%esp)
        calll   print@PLT

如果我使用GNU LD来链接A使用此链接器脚本的平面二进制文件：

SECTIONS {
  . = 16M;
  .text : ALIGN(4K) {
    *(.text)
  }
}

我会收到以下链接错误：

undefined reference to `_GLOBAL_OFFSET_TABLE_'

第一期

给定我之前显示的组件，我是否还希望链接器为平坦的二进制文件产生got？ /em>

在相应的对象文件中，我看到了这两个重新转移：

 Offset     Info    Type                Sym. Value  Symbol's Name
0000001d  00000a0a R_386_GOTPC            00000000   _GLOBAL_OFFSET_TABLE_
00000023  00000309 R_386_GOTOFF           00000000   .L.str

现在根据此文档我发现，我认为链接器应该发出一个got：

r_386_gotoff
计算符号值与全局偏移表的地址之间的差异。它还指示链接编辑器创建全局偏移表。
R_386_GOTPC
类似于R_386_PC32，除了它在其计算中使用全局偏移表的地址。此搬迁中引用的符号通常是 global_offset_table ，它还指示链接编辑器创建全局偏移表。

这是ld的问题，还是应该ld实际上不需要发射got，因为我正在生产平坦的二进制文件而不是精灵二进制文件？

第二期

现在，我可以通过在实际定义此符号的.s文件中编译和链接来修补此错误

  .globl _GLOBAL_OFFSET_TABLE_
  .section .got,"wa",@progbits
_GLOBAL_OFFSET_TABLE_:
  .word 0xabcd  // Filler data so it's easier to find in the objdump

： i objdump it：

00000000 <.data>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 45 08                mov    0x8(%ebp),%eax
   6:   5d                      pop    %ebp
   7:   c3                      ret    
   8:   90                      nop
...
   f:   90                      nop
  10:   55                      push   %ebp
  11:   89 e5                   mov    %esp,%ebp
  13:   53                      push   %ebx
  14:   50                      push   %eax
  15:   e8 00 00 00 00          call   0x1a
  1a:   5b                      pop    %ebx
  1b:   81 c3 1b 00 00 00       add    $0x1b,%ebx
  21:   8d 83 38 00 00 01       lea    0x1000038(%ebx),%eax
  27:   89 04 24                mov    %eax,(%esp)
  2a:   e8 d1 ff ff ff          call   0x0
  2f:   83 c4 04                add    $0x4,%esp
  32:   5b                      pop    %ebx
  33:   5d                      pop    %ebp
  34:   c3                      ret    
  35:   cd ab                   int    $0xab
  37:   00 61 62                add    %ah,0x62(%ecx)
  3a:   63 64 0a 00             arpl   %sp,0x0(%edx,%ecx,1)

$ _ global_offset_table的值>并计算为GOT（0x35）和当前PC（0x1b）和（。ltmp3-.l1 $ pb）之间的偏移量（SO 0x35-0x1b） +0x1 = 0x1b）。

我的第二个问题是，.l.str@gotoff的值似乎假设got在地址零处。相应的重定位是r_386_gotoff，它被计算为符号（.l.str）和got之间的偏移。现在，如果我的二进制启动在16MB（来自链接器脚本），并且.l.str进入二进制的偏移量是0x38，则是该位置的位置符号为0x1000038。如果是这样，结果为0x1000038，那么这意味着GOT为零。

我的第二个问题是：有没有办法手动告诉链接器got在哪里？我猜我的_global_offset_table_table _技巧在这里无法使用，因为_global_offset_table_table_ < /code>可能更多地充当发出的符号，以指示实际上是在哪里，而不是相反（在_global_offset_table_table _的位置上查找的链接器，然后将其放置在那里）。

我的总体目标是看看我是否可以在纯C/C ++（在一定程度上）编写一个平坦的PIC二进制文件。我至少知道这个小型代码示例，我可以用类似的内容来绕过纯汇编，

  call .L$pb
.L$pb:
  pop %ebx
  addl $(.L.str - .L$pb), %ebx
  movl %ebx, (%esp)
  calll print@PLT

而不是在PC和got和got and got and and .l.str之间添加偏移量，我只是在PC和.l.str之间取偏移。这会发出R_386_PC32 $（。l.str- .l $ pb），可以静态解决。结果仍然是图片，但没有得到。与链接器如何放松呼叫和函数定义相同的二进制文件的方式相似的方式，我想知道是否有一种放松这两个方法的方法，可以将相对引用到我的二进制二进制文件中数据正确。

原文

I'm attempting to build a flat 32-bit PIC binary with the following C++ code:

extern "C" {
void print(const char *){}

void entry_func() {
  print("abcd\n");
}
}

The assembly produced for the print("abcd\n") bit is:

        calll   .L1$pb
.L1$pb:
        popl    %ebx
.Ltmp3:
        addl    $_GLOBAL_OFFSET_TABLE_+(.Ltmp3-.L1$pb), %ebx
        leal    .L.str@GOTOFF(%ebx), %eax
        movl    %eax, (%esp)
        calll   print@PLT

If I use GNU ld to link a flat binary using this linker script:

SECTIONS {
  . = 16M;
  .text : ALIGN(4K) {
    *(.text)
  }
}

I get the following link error:

undefined reference to `_GLOBAL_OFFSET_TABLE_'

First Issue

Given the assembly I showed earlier, should I still expect the linker to produce a GOT even for a flat binary?

In the corresponding object file, I see these two relocations:

 Offset     Info    Type                Sym. Value  Symbol's Name
0000001d  00000a0a R_386_GOTPC            00000000   _GLOBAL_OFFSET_TABLE_
00000023  00000309 R_386_GOTOFF           00000000   .L.str

Now according to this documentation I found, I would think the linker should emit a GOT:

R_386_GOTOFF
Computes the difference between a symbol's value and the address of the global offset table. It also instructs the link-editor to create the global offset table.
R_386_GOTPC
Resembles R_386_PC32, except that it uses the address of the global offset table in its calculation. The symbol referenced in this relocation normally is GLOBAL_OFFSET_TABLE, which also instructs the link-editor to create the global offset table.

Is this an issue with ld, or perhaps should ld not actually need to emit a GOT because I'm producing a flat binary rather than an ELF binary?

Second Issue

Now I can patch this error by also compiling and linking in a .S file that actually defines this symbol:

  .globl _GLOBAL_OFFSET_TABLE_
  .section .got,"wa",@progbits
_GLOBAL_OFFSET_TABLE_:
  .word 0xabcd  // Filler data so it's easier to find in the objdump

This links successfully, but my binary seems to be incorrect when I objdump it:

00000000 <.data>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   8b 45 08                mov    0x8(%ebp),%eax
   6:   5d                      pop    %ebp
   7:   c3                      ret    
   8:   90                      nop
...
   f:   90                      nop
  10:   55                      push   %ebp
  11:   89 e5                   mov    %esp,%ebp
  13:   53                      push   %ebx
  14:   50                      push   %eax
  15:   e8 00 00 00 00          call   0x1a
  1a:   5b                      pop    %ebx
  1b:   81 c3 1b 00 00 00       add    $0x1b,%ebx
  21:   8d 83 38 00 00 01       lea    0x1000038(%ebx),%eax
  27:   89 04 24                mov    %eax,(%esp)
  2a:   e8 d1 ff ff ff          call   0x0
  2f:   83 c4 04                add    $0x4,%esp
  32:   5b                      pop    %ebx
  33:   5d                      pop    %ebp
  34:   c3                      ret    
  35:   cd ab                   int    $0xab
  37:   00 61 62                add    %ah,0x62(%ecx)
  3a:   63 64 0a 00             arpl   %sp,0x0(%edx,%ecx,1)

The value for $_GLOBAL_OFFSET_TABLE_+(.Ltmp3-.L1$pb) seems to have expanded correctly: _GLOBAL_OFFSET_TABLE_ has the relocation R_386_GOTPC and is calculated as the offset between the GOT (0x35) and the current PC (0x1b), and (.Ltmp3-.L1$pb) is just 1 byte (so 0x35-0x1b+0x1 = 0x1b).

My second issue is that the value for .L.str@GOTOFF seems to assume the GOT is at address zero. It's corresponding relocation is R_386_GOTOFF which is calculated as the offset between the symbol (.L.str) and the GOT. Now if I had my binary start at 16MB (from the linker script), and the offset for .L.str into the binary is at 0x38, then the location for the symbol is 0x1000038. If so, and the result is 0x1000038 then this implies the GOT is at zero.

My second question: is there a way to manually tell the linker where the GOT is? I'm guessing my _GLOBAL_OFFSET_TABLE_ trick didn't work here because _GLOBAL_OFFSET_TABLE_ probably acts more as a symbol that's emitted to indicate where the GOT actually is rather than the other way around (the linker looking up wherever _GLOBAL_OFFSET_TABLE_ is and placing the GOT there).

My overall goal is to see if I can write a flat PIC binary in pure C/C++ (to a certain extent). I know at least for this small code example that I could circumvent the GOT in pure assembly with something like:

  call .L$pb
.L$pb:
  pop %ebx
  addl $(.L.str - .L$pb), %ebx
  movl %ebx, (%esp)
  calll print@PLT

Here rather than adding offsets between the PC and GOT, and GOT and .L.str, I just take the offset between the PC and .L.str. This emits a R_386_PC32 for $(.L.str - .L$pb) which can be resolved statically. The result is still PIC, but without the GOT. In a similar way to how the linker can relax PLT relocations to relative calls if the call and function definition are in the same binary, I wonder if there's a way to relax these two GOT relocations to just take the relative reference to my binary-local data correctly.

分享到QQ

分享到微博