为什么 Windows x64 调用约定不使用 XMM 寄存器来传递 4 个以上的整数参数?
(Microsoft) x64 调用约定状态:
参数在寄存器 RCX、RDX、R8 和 R9 中传递。如果参数是 float/double,则它们将在 XMM0L、XMM1L、XMM2L 和 XMM3L 中传递。
这很好,但为什么只是浮动/双打呢?为什么整数(也许还有指针)不也通过 XMM 寄存器传递?
看起来有点浪费可用空间,不是吗?
The (Microsoft) x64 calling convention states:
The arguments are passed in registers RCX, RDX, R8, and R9. If the arguments are float/double, they are passed in XMM0L, XMM1L, XMM2L, and XMM3L.
That's great, but why just floats/doubles? Why aren't integers (and maybe pointers) also passed via XMM registers?
Seems a little like a waste of available space, doesn't it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
因为大多数对非 FP 值(即整数和地址)的操作都被设计为使用通用寄存器。
有整数 SSE 运算,但它们只是算术运算。
因此,如果调用约定支持通过 SSE 寄存器传递整数和地址,则几乎总是需要将值复制到通用寄存器。
Because most operations on non-FP values (i.e. integers and addresses) are designed to use general purpose registers.
There're integer SSE operations but they are arithmetical only.
So, if calling convention supported passing integers and addresses via SSE registers, it would be almost always necessary to copy value to general purpose registers.
函数通常希望将整数参数与指针一起使用(作为索引或计算结束指针作为循环边界),或与 GP 寄存器中的其他整数参数一起使用。或者使用从内存加载的其他想要在 GP 寄存器中使用的整数,
您无法有效地使用 XMM 寄存器中的整数作为循环计数器或界限,因为没有为分支指令设置整数标志的打包整数比较。 (
pcmpgtd
创建 0/-1 个元素的掩码)。另请参阅 为什么不在 XMM 向量中存储函数参数寄存器? 以及此处的其他答案以获取更多信息。
但除此之外,这种设计思想甚至不是 Windows x64 fastcall/vectorcall 的选项。
Windows x64 选择故意浪费空间来简化可变参数函数。寄存器args可以转储到返回地址上方的32字节“影子空间”/“主空间”中,以形成一个args数组。
这就是为什么(例如)Windows x64 在 R8 或 XMM2 中传递第三个参数,而不管早期参数的类型如何。为什么调用可变参数函数需要将 FP 参数复制到相应的整数寄存器,因此函数序言可以转储 arg 寄存器,而无需弄清楚哪些可变参数是 FP,哪些是整数。
为了使 arg-array 正常工作,无论您是否混合使用整数和 FP 参数,总共只能在寄存器中传递 4 个参数。 有足够的 GP 整数寄存器来容纳最大数量已经有寄存器参数了,即使它们都是整数。
第一个最多 8 个 FP 参数都会在 xmm0..7 中传递。)
(与 x86-64 System V 不同,无论使用多少个整数/指针参数传递寄存器, .com/questions/4429398/why-does-windows64-use-a- Different-calling-convention-from-all-other-oses-on-x86">为什么 Windows64 使用与 x86 上所有其他操作系统不同的调用约定-64?
Functions often want to use integer args with pointers (as indices or to calculate an end-pointer as a loop bound), or with other integer args in GP registers. Or with other integers loaded from memory that they want to work with in GP registers
You can't efficiently use an integer in an XMM reg as a loop counter or bound, because there's no packed-integer compare that sets integer flags for branch instructions. (
pcmpgtd
creates a mask of 0/-1 elements).See also Why not store function parameters in XMM vector registers? and the other answer here for more.
But even beyond that, this design idea is not even an option for Windows x64 fastcall / vectorcall.
Windows x64 chooses to waste space on purpose to simplify variadic functions. The register args can be dumped into the 32-byte "shadow space" / "home space" above the return address, to form an array of args.
This is why (for example) Windows x64 passes the 3rd arg in R8 or XMM2, regardless of the types of the earlier args. And why calls to variadic functions require FP args to also be copied to the corresponding integer register, so the function prologue can dump the arg regs without figuring out which variadic args were FP and which were integer.
To make the arg-array thing work, only 4 total args can be passed in registers, regardless of whether you have a mix of integer and FP args. There are enough GP integer regs to hold the max number of register args already, even if they're all integer.
(Unlike x86-64 System V, where the first up-to-8 FP args are passed in xmm0..7 regardless of how many integer/pointer arg-passing registers are used.)
Why does Windows64 use a different calling convention from all other OSes on x86-64?