为什么 Windows64 使用与 x86-64 上所有其他操作系统不同的调用约定?
AMD 有一个 ABI 规范,描述了在 x86-64 上使用的调用约定。所有操作系统都遵循它,但 Windows 除外,它有自己的 x86-64 调用约定。为什么?
有谁知道这种差异的技术、历史或政治原因,还是纯粹是 NIH 综合症的问题?
我知道不同的操作系统可能对更高级别的事物有不同的需求,但这并不能解释为什么例如 Windows 上的寄存器参数传递顺序是 rcx - rdx - r8 - r9 - rest on stack 而其他人都使用rdi - rsi - rdx - rcx - r8 - r9 - rest on stack
。
PS 我知道这些调用约定通常有何不同,并且我知道在需要时在哪里可以找到详细信息。我想知道的是为什么。
编辑:有关操作方法,请参阅 wikipedia 条目 以及那里的链接。
AMD has an ABI specification that describes the calling convention to use on x86-64. All OSes follow it, except for Windows which has it's own x86-64 calling convention. Why?
Does anyone know the technical, historical, or political reasons for this difference, or is it purely a matter of NIHsyndrome?
I understand that different OSes may have different needs for higher level things, but that doesn't explain why for example the register parameter passing order on Windows is rcx - rdx - r8 - r9 - rest on stack
while everyone else uses rdi - rsi - rdx - rcx - r8 - r9 - rest on stack
.
P.S. I am aware of how these calling conventions differ generally and I know where to find details if I need to. What I want to know is why.
Edit: for the how, see e.g. the wikipedia entry and links from there.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在 x64 上选择四个参数寄存器 - UN*X / Win64 常见的关于
x86 需要记住的事情之一是寄存器名称到“reg number”编码并不明显;就指令编码而言(MOD R/M字节,请参见http://www.c-jump.com/CIS77/CPU/x86/X77_0060_mod_reg_r_m_byte.htm),寄存器编号 0...7 依次为 -
?AX< /code>,
?CX
,?DX
,?BX
,?SP
,?BP< /代码>,<代码>?SI,<代码>?DI。
因此,选择 A/C/D (regs 0..2) 作为返回值和前两个参数(这是“经典”32 位 __fastcall 约定)是一个合乎逻辑的选择。就 64 位而言,“更高”的规则是有序的,Microsoft 和 UN*X/Linux 都将
R8
/R9
作为第一个规则。牢记这一点,Microsoft 选择了
RAX
(返回值)和RCX
、RDX
、R8
、如果您选择四个寄存器作为参数,>R9
(arg[0..3]) 是一个可以理解的选择。我不知道为什么 AMD64 UN*X ABI 在
RCX
之前选择了RDX
。在 x64 上选择六个参数寄存器 - UN*X 特定的
UN*X,在 RISC 架构上,传统上在寄存器中完成参数传递 - 特别是对于前六个 参数(至少在 PPC、SPARC、MIPS 上是如此)。这可能是 AMD64 (UN*X) ABI 设计者选择在该架构上使用六个寄存器的主要原因之一。
因此,如果您想要六个个寄存器来传递参数,那么选择
RCX
、RDX
、是合乎逻辑的>R8
和R9
其中四个,您应该选择哪两个?“更高”的寄存器需要额外的指令前缀字节来选择它们,因此具有更大的指令大小占用空间,因此如果您有选择,您不会想选择其中任何一个。在经典寄存器中,由于
RBP
和RSP
的隐式含义,这些寄存器不可用,而RBX
传统上,UN*X(全局偏移表)有特殊用途,AMD64 ABI 设计者似乎不想不必要地与之不兼容。因此,唯一的选择是
RSI
/RDI
。因此,如果您必须将 RSI / RDI 作为参数寄存器,那么它们应该是哪些参数?
让它们成为
arg[0]
和arg[1]
有一些优点。参见cHao的评论。?SI
和?DI
是字符串指令源/目标操作数,正如 cHao 提到的,它们用作参数寄存器意味着使用 AMD64 UN*X 调用约定,最简单的例如,可能的strcpy()
函数仅包含两个 CPU 指令repz movsb; ret
因为源/目标地址已被调用者放入正确的寄存器中。尤其是在低级和编译器生成的“粘合”代码中(例如,一些 C++ 堆分配器在构造时零填充对象,或者 sbrk() 上的内核零填充堆页面)。 code> 或写时复制页错误)大量的块复制/填充,因此对于经常用于保存两个或三个 CPU 指令的代码非常有用,否则这些指令会加载此类源/目标地址参数进入“正确”的寄存器。因此,在某种程度上,UN*X 和 Win64 的唯一不同之处在于 UN*X 在有意选择的 RSI/RDI 寄存器中“前置”两个附加参数,以自然在
RCX
、RDX
、R8
和R9
中选择四个参数。除此之外...
UN*X 和 Windows x64 ABI 之间还有更多差异,而不仅仅是参数到特定寄存器的映射。有关 Win64 的概述,请查看:
http://msdn.microsoft.com/en -us/library/7kcdt6fy.aspx
Win64 和 AMD64 UN*X 在堆栈空间的使用方式上也有显着差异;例如,在 Win64 上,调用者必须为函数参数分配堆栈空间,即使参数 0...3 是在寄存器中传递的。另一方面,在 UN*X 上,如果叶函数(即不调用其他函数的函数)需要的堆栈空间不超过 128 字节,则根本不需要分配堆栈空间(是的,您拥有并可以使用一定数量的堆栈而不分配它......好吧,除非你是内核代码,一个漂亮的错误的来源)。所有这些都是特定的优化选择,其大部分基本原理都在原始发布者的维基百科参考指向的完整 ABI 参考中进行了解释。
Choosing four argument registers on x64 - common to UN*X / Win64
One of the things to keep in mind about x86 is that the register name to "reg number" encoding is not obvious; in terms of instruction encoding (the MOD R/M byte, see http://www.c-jump.com/CIS77/CPU/x86/X77_0060_mod_reg_r_m_byte.htm), register numbers 0...7 are - in that order -
?AX
,?CX
,?DX
,?BX
,?SP
,?BP
,?SI
,?DI
.Hence choosing A/C/D (regs 0..2) for return value and the first two arguments (which is the "classical" 32bit
__fastcall
convention) is a logical choice. As far as going to 64bit is concerned, the "higher" regs are ordered, and both Microsoft and UN*X/Linux went forR8
/R9
as the first ones.Keeping that in mind, Microsoft's choice of
RAX
(return value) andRCX
,RDX
,R8
,R9
(arg[0..3]) are an understandable selection if you choose four registers for arguments.I don't know why the AMD64 UN*X ABI chose
RDX
beforeRCX
.Choosing six argument registers on x64 - UN*X specific
UN*X, on RISC architectures, has traditionally done argument passing in registers - specifically, for the first six arguments (that's so on PPC, SPARC, MIPS at least). Which might be one of the major reasons why the AMD64 (UN*X) ABI designers chose to use six registers on that architecture as well.
So if you want six registers to pass arguments in, and it's logical to choose
RCX
,RDX
,R8
andR9
for four of them, which other two should you pick ?The "higher" regs require an additional instruction prefix byte to select them and therefore have a bigger instruction size footprint, so you wouldn't want to choose any of those if you have options. Of the classical registers, due to the implicit meaning of
RBP
andRSP
these aren't available, andRBX
traditionally has a special use on UN*X (global offset table) which seemingly the AMD64 ABI designers didn't want to needlessly become incompatible with.Ergo, the only choice were
RSI
/RDI
.So if you have to take
RSI
/RDI
as argument registers, which arguments should they be ?Making them
arg[0]
andarg[1]
has some advantages. See cHao's comment.?SI
and?DI
are string instruction source / destination operands, and as cHao mentioned, their use as argument registers means that with the AMD64 UN*X calling conventions, the simplest possiblestrcpy()
function, for example, only consists of the two CPU instructionsrepz movsb; ret
because the source/target addresses have been put into the correct registers by the caller. There is, particularly in low-level and compiler-generated "glue" code (think, for example, some C++ heap allocators zero-filling objects on construction, or the kernel zero-filling heap pages onsbrk()
, or copy-on-write pagefaults) an enormous amount of block copy/fill, hence it'll be useful for code so frequently used to save the two or three CPU instructions that'd otherwise load such source/target address arguments into the "correct" registers.So in a way, UN*X and Win64 are only different in that UN*X "prepends" two additional arguments, in purposefully chosen
RSI
/RDI
registers, to the natural choice of four arguments inRCX
,RDX
,R8
andR9
.Beyond that ...
There are more differences between the UN*X and Windows x64 ABIs than just the mapping of arguments to specific registers. For the overview on Win64, check:
http://msdn.microsoft.com/en-us/library/7kcdt6fy.aspx
Win64 and AMD64 UN*X also strikingly differ in the way stackspace is used; on Win64, for example, the caller must allocate stackspace for function arguments even though args 0...3 are passed in registers. On UN*X on the other hand, a leaf function (i.e. one that doesn't call other functions) is not even required to allocate stackspace at all if it needs no more than 128 Bytes of it (yes, you own and can use a certain amount of stack without allocating it ... well, unless you're kernel code, a source of nifty bugs). All these are particular optimization choices, most of the rationale for those is explained in the full ABI references that the original poster's wikipedia reference points to.
我不知道 Windows 为何这么做。请参阅此答案的末尾进行猜测。我很好奇 SysV 调用约定是如何决定的,所以我深入研究了 邮件列表存档 并发现了一些巧妙的东西。
阅读 AMD64 邮件列表上的一些旧线程很有趣,因为 AMD 架构师对此很活跃。例如,选择寄存器名称是困难的部分之一:AMD 认为 重命名原始的 8 个寄存器 r0-r7,或调用新寄存器
UAX
等。此外,内核开发人员的反馈还确定了原始的 8 个寄存器< 的设计code>syscall 和
swapgs
不可用。这就是 AMDSysV (Linux) 调用约定以及应保留多少寄存器与调用者保存多少寄存器的决定是 最初由 Jan Hubicka(gcc 开发人员)于 2000 年 11 月制作。他编译了SPEC2000 并查看了代码大小和指令数量。该讨论线程围绕着一些与此问题的答案和评论相同的想法。在第二个线程中,他 提议当前序列为最佳序列,并希望是最终序列,生成的代码比某些替代方案更小。
他使用“全局”一词来表示呼叫保留 寄存器,如果使用则必须压入/弹出。
选择
rdi
、rsi
、rdx
作为前三个参数的动机是:的函数中节省少量代码大小memset
或其他 C 字符串函数在其参数上(其中 gcc 内联了一个代表字符串操作?)rbx
是调用保留的,因为有两个调用保留的寄存器可以在没有 REX 前缀的情况下访问(rbx
和rbp
)是一个胜利。大概选择它们是因为它们是唯一不被任何通用指令隐式使用的“传统”寄存器。 (代表字符串、移位计数和 mul/div 输出/输入涉及其他所有内容)。cmpxchg16b
和cpuid
需要 RBX,但很少使用,所以不是一个大因素。 (cmpxchg16b
不是原始 AMD64 的一部分,但 RBX 仍然是显而易见的选择。cmpxchg8b
存在,但已被 qwordcmpxchg
废弃)我们试图在序列的早期避免 RCX,因为它是寄存器
通常用于特殊目的,如 EAX,因此它具有相同的目的
序列中缺失。
它也不能用于系统调用,我们希望创建系统调用序列
尽可能匹配函数调用顺序。
(背景:
syscall
/sysret
不可避免地会破坏rcx
(使用rip
)和r11
(使用RFLAGS
),因此当syscall
运行时,内核无法看到rcx
中最初的内容。)内核系统调用 ABI 是选择匹配函数调用 ABI,除了
r10
而不是rcx
,因此像mmap(2)
这样的 libc 包装函数可以只mov %rcx, %r10
/mov $0x9, %eax
/系统调用
。请注意,与 Window 的 32 位 __vectorcall 相比,i386 Linux 使用的 SysV 调用约定很糟糕。 它传递堆栈上的所有内容,并且仅在
edx:eax
适用于 int64,不适用于小型结构。毫不奇怪,我们几乎没有付出什么努力来保持与它的兼容性。当没有理由不这样做时,他们会做诸如保留rbx
调用之类的事情,因为他们认为在原始 8 中拥有另一个(不需要 REX 前缀)是好的。从长远来看,使 ABI 达到最佳状态比任何其他考虑因素都要重要得多。我认为他们做得很好。我不完全确定是否返回打包到寄存器中的结构,而不是返回不同寄存器中的不同字段。我猜想通过值传递它们而不实际对字段进行操作的代码会以这种方式获胜,但是解包的额外工作似乎很愚蠢。他们可以有更多的整数返回寄存器,而不仅仅是 rdx:rax,因此返回具有 4 个成员的结构可以以 rdi、rsi、rdx、rax 或其他形式返回它们。
他们考虑在向量寄存器中传递整数,因为 SSE2 可以对整数进行操作。幸运的是他们没有这样做。 整数经常用作指针偏移量,并且堆栈内存的往返非常便宜。此外,SSE2 指令比整数指令占用更多的代码字节。
我怀疑 Windows ABI 设计者的目标可能是最小化 32 位和 64 位之间的差异,以方便那些必须将 asm 从一个移植到另一个的人,或者可以在某些情况下使用几个
#ifdef
的人。 ASM 使同一源可以更轻松地构建 32 或 64 位版本的函数。最小化工具链的变化似乎不太可能。 x86-64 编译器需要一个单独的表,其中列出寄存器的用途以及调用约定。与 32 位有少量重叠不太可能显着节省工具链代码大小/复杂性。
IDK why Windows did what they did. See the end of this answer for a guess. I was curious about how the SysV calling convention was decided on, so I dug into the mailing list archive and found some neat stuff.
It's interesting reading some of those old threads on the AMD64 mailing list, since AMD architects were active on it. e.g. Choosing register names was one of the hard parts: AMD considered renaming the original 8 registers r0-r7, or calling the new registers
UAX
etc.Also, feedback from kernel devs identified things that made the original design of
syscall
andswapgs
unusable. That's how AMD updated the instruction to get this sorted out before releasing any actual chips. It's also interesting that in late 2000, the assumption was that Intel probably wouldn't adopt AMD64.The SysV (Linux) calling convention, and the decision on how many registers should be callee-preserved vs. caller-save, was made initially in Nov 2000, by Jan Hubicka (a gcc developer). He compiled SPEC2000 and looked at code size and number of instructions. That discussion thread bounces around some of the same ideas as answers and comments on this SO question. In a 2nd thread, he proposed the current sequence as optimal and hopefully final, generating smaller code than some alternatives.
He's using the term "global" to mean call-preserved registers, that have to be push/popped if used.
The choice of
rdi
,rsi
,rdx
as the first three args was motivated by:memset
or other C string function on their args (where gcc inlines a rep string operation?)rbx
is call-preserved because having two call-preserved regs accessible without REX prefixes (rbx
andrbp
) is a win. Presumably chosen because they're the only "legacy" registers that aren't implicitly used by any common instruction. (rep string, shift count, and mul/div outputs/inputs touch everything else).cmpxchg16b
andcpuid
need RBX, but are rarely used so not a big factor. (cmpxchg16b
wasn't part of original AMD64, but RBX would still have been the obvious choice.cmpxchg8b
exists but was obsoleted by qwordcmpxchg
)(background:
syscall
/sysret
unavoidably destroyrcx
(withrip
) andr11
(withRFLAGS
), so the kernel can't see what was originally inrcx
whensyscall
ran.)The kernel system-call ABI was chosen to match the function call ABI, except for
r10
instead ofrcx
, so a libc wrapper functions likemmap(2)
can justmov %rcx, %r10
/mov $0x9, %eax
/syscall
.Note that the SysV calling convention used by i386 Linux sucks compared to Window's 32bit __vectorcall. It passes everything on the stack, and only returns in
edx:eax
for int64, not for small structs. It's no surprise little effort was made to maintain compatibility with it. When there's no reason not to, they did things like keepingrbx
call-preserved, since they decided that having another in the original 8 (that don't need a REX prefix) was good.Making the ABI optimal is much more important long-term than any other consideration. I think they did a pretty good job. I'm not totally sure about returning structs packed into registers, instead of different fields in different regs. I guess code that passes them around by value without actually operating on the fields wins this way, but the extra work of unpacking seems silly. They could have had more integer return registers, more than just
rdx:rax
, so returning a struct with 4 members could return them in rdi, rsi, rdx, rax or something.They considered passing integers in vector regs, because SSE2 can operate on integers. Fortunately they didn't do that. Integers are used as pointer offsets very often, and a round-trip to stack memory is pretty cheap. Also SSE2 instructions take more code bytes than integer instructions.
I suspect Windows ABI designers might have been aiming to minimize differences between 32 and 64bit for the benefit of people that have to port asm from one to the other, or that can use a couple
#ifdef
s in some ASM so the same source can more easily build a 32 or 64bit version of a function.Minimizing changes in the toolchain seems unlikely. An x86-64 compiler needs a separate table of which register is used for what, and what the calling convention is. Having a small overlap with 32bit is unlikely to produce significant savings in toolchain code size / complexity.
请记住,微软最初“官方对早期 AMD64 的努力不置可否”(摘自 “现代 64 位计算的历史”,作者:Matthew Kerner 和 Neil Padgett),因为他们是 Intel 在 IA64 架构方面的强有力的合作伙伴。我认为这意味着即使他们愿意与 GCC 工程师合作开发 ABI 以在 Unix 和 Windows 上使用,他们也不会这样做,因为这意味着公开支持 AMD64 的努力,而他们没有这样做。尚未正式这样做(并且可能会令英特尔感到不安)。
最重要的是,当时微软完全没有对开源项目友好的倾向。当然不是 Linux 或 GCC。
那么他们为什么要在 ABI 上进行合作呢?我猜想 ABI 之所以不同,只是因为它们或多或少是同时且独立设计的。
另一段引自《现代 64 位计算的历史》:
这表明,就连AMD也并不认为MS和Unix之间的合作一定是最重要的,而对Unix/Linux的支持才是非常重要的。也许甚至试图说服一方或双方妥协或合作也不值得付出努力或冒险(?)激怒他们中的任何一方?也许 AMD 认为,即使建议通用的 ABI,也可能会延迟或破坏更重要的目标,即在芯片准备就绪时就准备好软件支持。
这是我的猜测,但我认为 ABI 不同的主要原因是政治原因,即 MS 和 Unix/Linux 双方没有在这方面合作,AMD 并不认为这是一个问题。
Remember that Microsoft was initially "officially noncommittal toward the early AMD64 effort" (from "A History of Modern 64-bit Computing" by Matthew Kerner and Neil Padgett) because they were strong partners with Intel on the IA64 architecture. I think that this meant that even if they would have otherwise been open to working with GCC engineers on a ABI to use both on Unix and Windows, they wouldn't have done so as it would mean publicly supporting the AMD64 effort when they hadn't yet officially done so (and would have probably upset Intel).
On top of that, back in those days Microsoft had absolutely no leanings toward being friendly with open source projects. Certainly not Linux or GCC.
So why would they have cooperated on an ABI? I'd guess that the ABIs are different simply because they were designed at more or less the same time and in isolation.
Another quote from "A History of Modern 64-bit Computing":
This indicates that even AMD didn't feel that cooperation was necessarily the most important thing between MS and Unix, but that having Unix/Linux support was very important. Maybe even trying to convince one or both sides to compromise or cooperate wasn't worth the effort or risk(?) of irritating either of them? Perhaps AMD thought that even suggesting a common ABI might delay or derail the more important objective of simply having software support ready when the chip was ready.
Speculation on my part, but I think the major reason the ABIs are different was the political reason that MS and the Unix/Linux sides just didn't work together on it, and AMD didn't see that as a problem.
Win32 对于 ESI 和 EDI 有其自己的用途,并且要求不得修改它们(或者至少在调用 API 之前恢复它们)。我想象 64 位代码对 RSI 和 RDI 执行相同的操作,这可以解释为什么它们不用于传递函数参数。
不过,我无法告诉你为什么 RCX 和 RDX 会互换。
Win32 has its own uses for ESI and EDI, and requires that they not be modified (or at least that they be restored before calling into the API). I'd imagine 64-bit code does the same with RSI and RDI, which would explain why they're not used to pass function arguments around.
I couldn't tell you why RCX and RDX are switched, though.