如何保证Delphi例程的16字节代码对齐？

发布于 2024-08-13 04:22:48 字数 1554 浏览 14 评论 0原文

背景：

我有一个优化的 Delphi/BASM 例程单元，主要用于繁重的计算。其中一些例程包含内部循环，如果循环起始与 DQWORD（16 字节）边界对齐，我可以实现显着的加速。如果我知道例程入口点的对齐情况，我可以确保所讨论的循环按照需要对齐。

据我所知，Delphi 编译器将过程/函数与 DWORD 边界对齐，例如，向单元添加函数可能会更改后续函数的对齐方式。然而，只要我将例程的末尾填充为 16 的倍数，我就可以确保后续例程同样对齐或不对齐，具体取决于第一个例程的对齐情况。因此，我尝试将关键例程放在单元实现部分的开头，并在它们之前放置一些填充代码，以便第一个过程将 DQWORD 对齐。

这看起来像下面这样：

interface

procedure FirstProcInUnit;

implementation

procedure __PadFirstProcTo16;
asm
    // variable number of NOP instructions here to get the desired code length
end;

procedure FirstProcInUnit;
asm //should start at DQWORD boundary
    //do something
    //padding to align the following label to DQWORD boundary
    @Some16BAlignedLabel:
        //code, looping back to @Some16BAlignedLabel
    //do something else
    ret #params
    //padding to get code length to multiple of 16
end;

initialization

__PadFirstProcTo16; //call this here so that it isn't optimised out
ASSERT ((NativeUInt(Pointer(@FirstProcInUnit)) AND $0F) = 0, 'FirstProcInUnit not DQWORD aligned');

end.

这有点令人头疼，但我可以在必要时让这种事情发挥作用。问题是，当我在不同的项目中使用这样的单元，或者对同一项目中的其他单元进行一些更改时，这仍然可能会破坏 __PadFirstProcTo16 本身的对齐。同样，使用不同编译器版本（例如 D2009 与 D2010）重新编译同一项目通常也会破坏对齐。因此，我发现做这类事情的唯一方法是手工，因为当项目的所有其余部分都处于最终形式时，这几乎是最后要做的事情。

问题 1：

是否有其他方法可以达到确保（至少某些特定的）例程 DQWORD 对齐的预期效果？

问题 2：

影响编译器代码对齐的具体因素有哪些，以及（如何）我可以使用这些特定知识来克服此处概述的问题？

假设对于这个问题，“不用担心代码对齐/相关的可能较小的速度优势”不是一个允许的答案。

原文

Background:

I have a unit of optimised Delphi/BASM routines, mostly for heavy computations. Some of these routines contain inner loops for which I can achieve a significant speed-up if the loop start is aligned to a DQWORD (16-byte) boundary. I can ensure that the loops in question are aligned as desired IF I know the alignment at the routine entry point.

As far as I can see, the Delphi compiler aligns procedures/functions to DWORD boundaries, and e.g. adding functions to the unit may change the alignment of subsequent ones. However, as long as I pad the end of routines to multiples of 16, I can ensure that subsequent routines are likewise aligned -- or misaligned, depending on the alignment of the first routine. I therefore tried to place the critical routines at the beginning of the unit's implementation section, and put a bit of padding code before them so that the first procedure would be DQWORD aligned.

This looks something like below:

interface

procedure FirstProcInUnit;

implementation

procedure __PadFirstProcTo16;
asm
    // variable number of NOP instructions here to get the desired code length
end;

procedure FirstProcInUnit;
asm //should start at DQWORD boundary
    //do something
    //padding to align the following label to DQWORD boundary
    @Some16BAlignedLabel:
        //code, looping back to @Some16BAlignedLabel
    //do something else
    ret #params
    //padding to get code length to multiple of 16
end;

initialization

__PadFirstProcTo16; //call this here so that it isn't optimised out
ASSERT ((NativeUInt(Pointer(@FirstProcInUnit)) AND $0F) = 0, 'FirstProcInUnit not DQWORD aligned');

end.

This is a bit of a pain in the neck, but I can get this sort of thing to work when necessary. The problem is that when I use such a unit in different projects, or make some changes to other units in the same project, this may still break the alignment of __PadFirstProcTo16 itself. Likewise, recompiling the same project with different compiler versions (e.g. D2009 vs. D2010) typically also breaks the alignment. So, the only way of doing this sort of thing I found was by hand as the pretty much last thing to be done when all the rest of the project is in its final form.

Question 1:

Is there any other way to achieve the desired effect of ensuring that (at least some specific) routines are DQWORD-aligned?

Question 2:

Which are the exact factors that affect the compiler's alignment of code and (how) could I use such specific knowledge to overcome the problem outlined here?

Assume that for the sake of this question "don't worry about code alignment/the associated presumably small speed benefits" is not a permissible answer.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

忱杏 2024-08-20 04:22:48

从 Delphi XE 开始，代码对齐问题现在可以使用 $CODEALIGN 编译器指令轻松解决（请参阅此 Delphi 文档页面）：

{$CODEALIGN 16}
procedure MyAlignedProc;
begin
..
end;

As of Delphi XE, the problem of code alignment is now easily solved using the $CODEALIGN compiler directive (see this Delphi documentation page):

{$CODEALIGN 16}
procedure MyAlignedProc;
begin
..
end;

回复收藏 0 原文

水溶 2024-08-20 04:22:48

您可以做的一件事是在每个例程的末尾添加一个“魔术”签名，在显式的 ret 指令之后：

asm
  ...
  ret
  db <magic signature bytes>
end;

现在您可以创建一个包含指向每个例程的指针的数组，在运行时扫描例程一次以获取神奇的签名来找到每个例程的结尾及其长度。然后，您可以使用 PAGE_EXECUTE_READWRITE 将它们复制到通过 VirtualAlloc 分配的新内存块，确保这次每个例程都在 16 字节边界上启动。

One thing that you could do, is to add a 'magic' signature at the end of each routine, after an explicit ret instruction:

asm
  ...
  ret
  db <magic signature bytes>
end;

Now you could create an array containing pointers to each routine, scan the routines at run-time once for the magic signature to find the end of each routine and therefore its length. Then, you can copy them to a new block of memory that you allocate with VirtualAlloc using PAGE_EXECUTE_READWRITE, ensuring this time that each routine starts on a 16-byte boundary.

回复收藏 0 原文

~没有更多了~