链接器如何重定位 MIPS 中的分支指令?
背景
我正在开展 2015 年 CS61C(伯克利)课程项目,编写一个链接器来链接从 MIPS 指令集的以下子集生成的目标文件。
Add Unsigned: addu $rd, $rs, $rt
Or: or $rd, $rs, $rt
Set Less Than: slt $rd, $rs, $rt
Set Less Than Unsigned: sltu $rd, $rs, $rt
Jump Register: jr $rs
Shift Left Logical: sll $rd, $rt, shamt
Add Immediate Unsigned: addiu $rt, $rs, immediate
Or Immediate: ori $rt, $rs, immediate
Load Upper Immediate: lui $rt, immediate
Load Byte: lb $rt, offset($rs)
Load Byte Unsigned: lbu $rt, offset($rs)
Load Word: lw $rt, offset($rs)
Store Byte: sb $rt, offset($rs)
Store Word: sw $rt, offset($rs)
Branch on Equal: beq $rs, $rt, label
Branch on Not Equal: bne $rs, $rt, label
Jump: j label
Jump and Link: jal label
Load Immediate: li $rt, immediate
Branch on Less Than: blt $rs, $rt, label
从这个指令子集中,我认为需要重定位的是 j
、bne
、beq
指令(blt
是伪指令),如果标签不存在于同一文件中,则后两者需要重新定位。
执行指令重定位的 MIPS 函数的注释如下:
#------------------------------------------------------------------------------
# function relocate_inst()
#------------------------------------------------------------------------------
# Given an instruction that needs relocation, relocates the instruction based
# on the given symbol and relocation table.
#
# You should return error if 1) the addr is not in the relocation table or
# 2) the symbol name is not in the symbol table. You may assume otherwise the
# relocation will happen successfully.
#
# Arguments:
# $a0 = an instruction that needs relocating
# $a1 = the byte offset of the instruction in the current file
# $a2 = the symbol table
# $a3 = the relocation table
#
# Returns: the relocated instruction, or -1 if error
注意,重定位表包含相对于正在链接的目标文件的开头的地址,而符号表是所有正在链接的目标文件的符号表的集合。并包含绝对地址。
问题
如果要重定位的指令是
j
指令,由于$a1
包含该指令的相对地址,所以我们找到需要重定位的标号在重定位表,然后在符号表中找到该标签的绝对地址。我们可以添加(绝对地址>>2)作为指令的低26位。如果要重定位的指令是
bne
或beq
但是,我不知道该怎么做,因为低位应该与 PC 相关+4,但是我们不知道被重定位的指令的绝对地址是多少,所以我们不知道PC+4是什么。
我错过了什么吗?
编辑:我们只考虑文本段。
Background
I'm working on a 2015 CS61C (Berkeley) course project on writing a linker to link object files generated from the following subset of the MIPS instruction set.
Add Unsigned: addu $rd, $rs, $rt
Or: or $rd, $rs, $rt
Set Less Than: slt $rd, $rs, $rt
Set Less Than Unsigned: sltu $rd, $rs, $rt
Jump Register: jr $rs
Shift Left Logical: sll $rd, $rt, shamt
Add Immediate Unsigned: addiu $rt, $rs, immediate
Or Immediate: ori $rt, $rs, immediate
Load Upper Immediate: lui $rt, immediate
Load Byte: lb $rt, offset($rs)
Load Byte Unsigned: lbu $rt, offset($rs)
Load Word: lw $rt, offset($rs)
Store Byte: sb $rt, offset($rs)
Store Word: sw $rt, offset($rs)
Branch on Equal: beq $rs, $rt, label
Branch on Not Equal: bne $rs, $rt, label
Jump: j label
Jump and Link: jal label
Load Immediate: li $rt, immediate
Branch on Less Than: blt $rs, $rt, label
From this subset of instructions, I think the ones that need relocation are j
, bne
, beq
instructions (blt
is a pseudo-instruction), the latter two needing to be relocated if the label is not present in the same file.
The comments of the MIPS function that does the relocation of an instruction reads
#------------------------------------------------------------------------------
# function relocate_inst()
#------------------------------------------------------------------------------
# Given an instruction that needs relocation, relocates the instruction based
# on the given symbol and relocation table.
#
# You should return error if 1) the addr is not in the relocation table or
# 2) the symbol name is not in the symbol table. You may assume otherwise the
# relocation will happen successfully.
#
# Arguments:
# $a0 = an instruction that needs relocating
# $a1 = the byte offset of the instruction in the current file
# $a2 = the symbol table
# $a3 = the relocation table
#
# Returns: the relocated instruction, or -1 if error
Note that the relocation table contains addresses relative to the start of the object file being linked, while the symbol table is an aggregate of the symbol tables of all the object files being linked and contains absolute addresses.
Problem
If the instruction to be relocated is a
j
instruction, since$a1
contains the relative address of the instruction, we find the label that needs to be relocated in the relocation table, and then find the absolute address for that label in the symbol table. We can than add (absolute address >> 2) as the low 26 bits of the instruction.If the instruction to be relocated is
bne
, orbeq
however, I am not sure what to do, since the low order bits are supposed to be relative to PC+4, but we don't know what the absolute address of the instruction being relocated is, so we don't know what PC+4 is.
Looking at various solutions online, it seems that only j
relocations are handled.
Am I missing something?
EDIT: We are only considering the text segment.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我的猜测是这个链接器不处理外部标签的分支指令(
bne
或beq
)。这将阻止使用
beq label
,其中 label 是外部的(全局的并且在另一个目标文件中),但这实际上只能在汇编中实现。例如,编译器输出将在单个函数中包含分支指令和目标位置,该函数进入单个代码块。 (模某些尾调用优化)。
有了这个限制,所有
bne
和beq
指令都已经由编译器或汇编器使用 pc 相对寻址来修复——不需要在这些的重定位表。此外,分支 (
beq
/bne
) 指令的范围 (+/-128k) 比j
短,因此如果链接器如果确实打算支持分支到外部标签,它可能还必须提供引入分支岛的能力来处理分支太远的分支岛。扩展您的示例:
有些
编译器不知道哪个函数位于哪个 DLL 中,因此,即使
printf
在 DLL 中,编译器输出仍然可能看起来相同。My guess is that this linker does not handle branch instructions (
bne
orbeq
) to external labels.This will preclude using
beq label
where label is external (global and in another object file), but this is only really possible to do in assembly.Compiler output, for example, will have both the branch instruction and target location all within a single function, which goes into a single code chunk. (modulo certain tail call optimization).
With that limitation, then all
bne
andbeq
instructions are already fixed up by the compiler or assembler, using pc-relative addressing — there would be no need for an entry in the relocation table for these.Further, the range of the branch (
beq
/bne
) instructions (+/-128k) is shorter than forj
, so if the linker were really intending to support branching to external label, it might also have to provide the capability to introduce branch islands to handle the ones that are branching too far away.To expand on your example:
would be
Some compilers don't know which function is in what DLL's so, even if
printf
was in a DLL, the compiler output could still look the same.