fastcall真的更快吗?

发布于 2024-08-20 09:36:00 字数 66 浏览 13 评论 0原文

fastcall 调用约定真的比其他调用约定(例如 cdecl)更快吗? 是否有任何基准可以显示调用约定如何影响性能?

Is the fastcall calling convention really faster than other calling conventions, such as cdecl?
Are there any benchmarks out there that show how performance is affected by calling convention?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

羁客 2024-08-27 09:36:00

这取决于平台。例如,对于 Xenon PowerPC,由于在堆栈上传递数据时存在加载-命中-存储问题,因此可能存在一个数量级的差异。根据经验,我将 cdecl 函数的开销计时为大约 45 个周期,而 fastcall 的开销约为 4 个周期。

对于乱序的 x86(Intel 和 AMD),影响可能要小得多,因为无论如何寄存器都被隐藏和重命名。

答案确实是,您需要在您关心的特定平台上自行进行基准测试。

It depends on the platform. For a Xenon PowerPC, for example, it can be an order of magnitude difference due to a load-hit-store issue with passing data on the stack. I empirically timed the overhead of a cdecl function at about 45 cycles compared to ~4 for a fastcall.

For an out-of-order x86 (Intel and AMD), the impact may be much less, because the registers are all shadowed and renamed anyway.

The answer really is that you need to benchmark it yourself on the particular platform you care about.

老街孤人 2024-08-27 09:36:00

fastcall 调用约定真的比其他调用约定(例如 cdecl)更快吗?

我相信 Microsoft 在 x86 和 x64 上实现 fastcall 需要在寄存器中而不是在堆栈上传递前两个参数。

由于它通常可以节省至少四次内存访问,所以它通常更快。然而,如果所涉及的函数是寄存器匮乏的,因此很可能将它们写入堆栈上的局部变量,则不可能有显着的增加。

Is the fastcall calling convention really faster than other calling conventions, such as cdecl?

I believe that Microsofts implementation of fastcall on x86 and x64 involves passing the first two parameters in registers instead of on the stack.

Since it typically saves at least four memory accesses, yes it is generally faster. However, if the function involved is register-starved and is thus likely to write them to locals on the stack anyway, there's not likely to be a significant increase.

兔小萌 2024-08-27 09:36:00

调用约定(至少在 x86 上)并不会真正对速度产生太大影响。在 Windows 中,_stdcall 被设为默认值,因为与 _cdecl 相比,它通常会产生更小的代码大小,从而为重要的程序产生切实的结果。 _fastcall 不是默认值,因为它造成的差异远不那么明显。通过寄存器传递参数所弥补的,是在效率较低的函数体中丢失的(如 Anon 之前提到的)。如果被调用的函数立即需要将所有内容溢出到内存中以进行自己的计算,那么通过传递寄存器将不会获得任何好处。

然而,我们可以整天滔滔不绝地滔滔不绝地阐述理论思想——对你的代码进行基准测试以获得正确的答案。 _fastcall 在某些情况下会更快,而在其他情况下会更慢。

Calling convention (at least on x86) doesn't really make much of a difference in speed. In Windows, _stdcall was made the default because it produces tangible results for nontrivial programs in that it usually results in smaller code size when compared with _cdecl. _fastcall is not the default value because the difference it makes is far less tangible. What you make up for in argument passing via registers you lose in less efficient function bodies (as previously mentioned by Anon.). You don't gain anything by passing in registers if the called function immediately needs to spill everything out into memory for its own calculations.

However, we can spout theoretical ideas all day long -- benchmark your code for the right answer. _fastcall will be faster in some cases, and slower in others.

烟凡古楼 2024-08-27 09:36:00

在现代 x86 上 - 不。在 L1 缓存和内联之间没有快速调用的空间。

On modern x86 - no. Between L1 cache and in-lining there's no place for fastcall.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文