如何在汇编器中实现相对 JMP (x86)？

发布于 2024-09-01 12:39:08 字数 1518 浏览 10 评论 0原文

在为 x86 平台构建汇编器时，我在编码 JMP 指令时遇到了一些问题：（

OPCODE   INSTRUCTION   SIZE
 EB cb     JMP rel8     2
 E9 cw     JMP rel16    4 (because of 0x66 16-bit prefix)
 E9 cd     JMP rel32    5
 ...

来自我最喜欢的 x86 指令网站，http://siyobik.info/index.php?module=x86&id=147)

全部都是 相对跳转，其中每个编码（操作+操作数）的大小在第三列中。

现在我原来的（因此是错误的）设计为每条指令保留了最大（5 字节）空间。操作数尚不清楚，因为它跳转到未知位置。因此，我实现了一种“重写”机制，如果跳转的位置已知，则将操作数重写到内存中的正确位置，并用 NOP 填充其余部分。这在紧循环中是一个比较严重的问题。

现在我的问题是以下情况：

b: XXX
c: JMP a
e: XXX
   ...
   XXX
d: JMP b
a: XXX      (where XXX is any instruction, depending
             on the to-be assembled program)

问题是我想要 JMP 指令的尽可能最小的编码（并且没有 NOP 填充）。

我必须先知道 c 处指令的大小，然后才能计算 a 处操作数的 a 和 b 之间的相对距离d。这同样适用于 c 处的 JMP：它需要知道 d 的大小，然后才能计算 e 之间的相对距离 和 a。

现有的汇编器如何解决这个问题，或者你会如何做到这一点？

这就是我正在思考的解决问题的方法：

首先将JMP与其目标之间的所有指令编码为操作码，如果该区域包含可变大小的操作码，则使用最大大小，例如5 JMP。然后，通过选择可能的最小编码大小（3、4 或 5），将相对 JMP 编码到目标，并计算距离。如果对任何可变大小的操作码进行编码，请更改之前的所有绝对操作数，以及跳过此编码指令的所有相对指令：当它们的操作数更改为选择最小可能的大小时，它们将被重新编码。此方法保证结束，因为可变大小的操作码仅可能缩小（因为它使用它们的最大大小）。

我想知道，也许这是一个过度设计的解决方案，这就是我问这个问题的原因。

原文

While building my assembler for the x86 platform I encountered some problems with encoding the JMP instruction:

OPCODE   INSTRUCTION   SIZE
 EB cb     JMP rel8     2
 E9 cw     JMP rel16    4 (because of 0x66 16-bit prefix)
 E9 cd     JMP rel32    5
 ...

(from my favourite x86 instruction website, http://siyobik.info/index.php?module=x86&id=147)

All are relative jumps, where the size of each encoding (operation + operand) is in the third column.

Now my original (and thus fault because of this) design reserved the maximum (5 bytes) space for each instruction. The operand is not yet known, because it's a jump to a yet unknown location. So I've implemented a "rewrite" mechanism, that rewrites the operands in the correct location in memory, if the location of the jump is known, and fills the rest with NOPs. This is a somewhat serious concern in tight-loops.

Now my problem is with the following situation:

b: XXX
c: JMP a
e: XXX
   ...
   XXX
d: JMP b
a: XXX      (where XXX is any instruction, depending
             on the to-be assembled program)

The problem is that I want the smallest possible encoding for a JMP instruction (and no NOP filling).

I have to know the size of the instruction at c before I can calculate the relative distance between a and b for the operand at d. The same applies for the JMP at c: it needs to know the size of d before it can calculate the relative distance between e and a.

How do existing assemblers solve this problem, or how would you do this?

This is what I am thinking which solves the problem:

First encode all the instructions to opcodes between the JMP and it's target, if this region contains a variable-sized opcode, use the maximum size, e.g. 5 for a JMP. Then encode the relative JMP to it's target, by choosing the smallest possible encoding size (3, 4 or 5) and calculate the distance. If any variable-sized opcode is encoded, change all absolute operands before, and all relative instructions that skips over this encoded instruction: they are re-encoded when their operand changes to choose the smallest possible size. This method is guaranteed to end, as variable-sized opcodes only may shrink (because it uses the maximum size of them).

I wonder, perhaps this is an over-engineered solution, that's why I ask this question.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谈场末日恋爱 2024-09-08 12:39:08

在第一遍中，您将通过对所有跳转指令使用悲观字节计数来非常接近要使用的 jmp 代码。

在第二遍中，您可以使用所选的悲观操作码填充跳转。然后，很少有跳转可以被重写以使用更少的一个或两个字节，只有那些非常接近最初的 8/16 位或 16/32 字节跳转阈值的跳转。由于候选者都是许多字节的跳转，因此它们不太可能处于关键循环情况，因此您可能会发现进一步的传递比两遍解决方案提供的好处很少或没有。

回复收藏 0 原文

执手闯天涯 2024-09-08 12:39:08

这是我使用过的一种方法，它可能看起来效率低下，但事实证明并不适合大多数现实生活中的代码（伪代码）：

IP := 0;
do
{
  done = true;
  while (IP < length)
  {
    if Instr[IP] is jump
      if backwards
      { Target known
          Encode short/long as needed }
      else
      {  Target unknown
          if (!marked as needing long encoding) // see below
            Encode short
          Record location for fixup }
    IP++;
  }
  foreach Fixup do
    if Jump > short
      Mark Jump location as requiring long encoding
      PC := FixupLocation; // restart at instruction that needs size change
      done = false; 
      break; // out of foreach fixup
    else
      encode jump
} while (!done);

Here's one approach I've used that may seem inefficient but turns out not to be for most real-life code (pseudo-code):

IP := 0;
do
{
  done = true;
  while (IP < length)
  {
    if Instr[IP] is jump
      if backwards
      { Target known
          Encode short/long as needed }
      else
      {  Target unknown
          if (!marked as needing long encoding) // see below
            Encode short
          Record location for fixup }
    IP++;
  }
  foreach Fixup do
    if Jump > short
      Mark Jump location as requiring long encoding
      PC := FixupLocation; // restart at instruction that needs size change
      done = false; 
      break; // out of foreach fixup
    else
      encode jump
} while (!done);

回复收藏 0 原文

~没有更多了~