sizeof(myPOD) 何时对于 x64 上的值传递来说太大?

发布于 2025-01-07 19:39:15 字数 359 浏览 2 评论 0原文

我预计对于大小最大为 8 字节的结构没有区别,但是更大的 POD 类型呢?当类型的大小超过机器字大小时,按值传递是否会变得更加昂贵,或者是否有其他因素(例如缓存行大小)会影响性能?

我主要对 x64 感兴趣,但也可以随意添加一些 x86 的数字。

澄清:

  • 我可能想得太狭隘了,因为我不知道其中起作用的所有内容(寄存器、调用约定、编译器优化)。我主要对 Microsoft 的 C++ 编译器感兴趣,它只使用 __fastcall。
  • 我很感兴趣在了解架构、类型大小、缓存大小等参数传递时是否有任何类型的一般建议。例如:“当类型小于 N 字节时,首选按值传递类型。”其中 N 是可以从我们已知的事物中推导出来的东西。

I expect no difference when it comes to structures that are up to 8 bytes in size, but what about bigger POD types? Does pass by value become more expensive the moment the type's size exceeds machine word size or is there something else (like cache line size) that can affect the performance?

I'm mainly interested in x64, but feel free to include some numbers for x86 as well.

Clarifications:

  • I'm probably thinking too narrowly because I'm not aware of everything that plays a role in this (registers, calling conventions, compiler optimizations). I'm mainly interested in Microsoft's C++ compiler and it only uses __fastcall.
  • I'm interested if there is any kind of general recommendation when it comes to parameter passing knowing the architecture, type size, cache size, etc. Something like: "Prefer passing the type by value when it's smaller than N bytes." where N is something that can be derived from the things we know.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

如梦 2025-01-14 19:39:15

您混淆了两个不同的问题。您可以按值传递任何对象(只要它是可复制的)。

是否将其传递到寄存器或堆栈中取决于实现,特别是所使用的调用约定。

在某些调用约定下,大于 8 字节(通用寄存器大小)的参数将在堆栈上传递。在其他调用约定下,它们可以简单地分为几个寄存器。

在某些情况下,对象可能永远不会在寄存器中传递,无论其大小如何。

类似地,SIMD 值 (SSE/AVX) 在某些调用约定中可以在寄存器中传递,但在其他调用约定中将始终放在堆栈上。对于标量浮点值来说也是如此。

但你所问的问题并不能真正得到有意义的回答。复制对象的速度会受到对象大小的影响,是的。如果对象是 POD 类型,并且它适合寄存器,那么可以使用简单的 mov 指令复制它。编译器是否会这样做取决于编译器。

显然,对象越大,占用的缓存空间就越多,这意味着您将获得更多的缓存未命中。

但这一切都如此模糊,几乎毫无用处。我们不知道您的对象是什么样子,也不知道您的代码用它做什么。如果您有特定的类型,请编写一个基准测试来查看编译器如何处理它。

回复您的编辑

我很感兴趣在了解架构、类型大小、缓存大小等参数传递时是否有任何类型的一般建议。例如:“当类型小于 N 字节时,最好按值传递类型。

首先,信任在许多情况下,它会积极地优化副本,因此即使您确实按值传递了一个大对象,它也不太可能是一个可测量的问题

。不太可能做出对于小对象,按值传递可以避免指针间接,因此在某些时候,这可能会被复制成本所淹没(假设对象被复制,请参阅。对于非常的对象(为了论证,假设500字节或以上,大到对象通常无法到达它),你绝对应该这样做。 来说。

但对于 的对象 8、16、24、40 字节?谁知道?谁在乎?在实际代码中不太可能产生可测量的差异。

这让我得出两条经验法则:

  1. 做看起来自然的事情:如果通过副本使您的代码更简单或更清晰,那就这样做。
  2. 如果性能很重要,那么 (1) 确保您所看到的内容实际上对您的性能有任何明显的影响。测量一下。如果它影响性能,那么可以对其进行测量。如果无法测量,那么根据定义,性能差异就不会明显。

因此,简而言之:

  • 对于原始类型,按值传递。
  • 对于非常大的类型,通过引用传递。
  • 对于其他一切,停止担忧并将时间花在富有成效的事情上。

You're confusing two separate issues. You can pass any object by value (as long as it is copyable).

Whether or not it will be passed in a register or on the stack depends on the implementation and specifically, the calling convention used.

Under some calling conventions, parameters larger than 8 bytes (the general-purpose register size) will be passed on the stack. Under other calling conventions, they may simply be split across several registers.

Under some, it is possible that objects are never passed in registers, regardless of their size.

Similarly, SIMD values (SSE/AVX) may be passed in registers in some calling conventions, but will always be put on the stack in others. And the same may be true for scalar floating-point values.

But what you're asking can't really be meaningfully answered. The speed of copying an object is affected by the object's size, yes. If the object is a POD type, and it fits in a register, then it can be copied with a simple mov instruction. Whether or not the compiler will do that is up to the compiler.

And obviously, the large the object is, the more cache space it takes up, which means you'll get more cache misses.

But this is all so vague that it is next to useless. We don't know what your object looks like, and we don't know what your code does with it. If you have a specific type in mind, then write a benchmark to see how it is handled by the compiler.

In response to your edit

I'm interested if there is any kind of general recommendation when it comes to parameter passing knowing the architecture, type size, cache size, etc. Something like: "Prefer passing the type by value when it's smaller than N bytes.

First, trust your compiler. It will aggressively optimize copies away in many situations, so even if you do pass a large object by value, it's unlikely to be a measurable problem.

Second, you're looking at a microoptimization which is unlikely to make a noticeable difference either way. For small objects, passing by value avoids a pointer indirection, so it's probably slightly faster. At some point, this becomes overwhelmed by the cost of copying (assuming the object is copied, see above). For very large objects (for the sake of argument, let's say 500 bytes or above, so large that objects normally don't reach it), you should definitely pass by reference.

But for objects of 8, 16, 24, 40 bytes? Who knows? Who cares? It's unlikely to make a measurable difference in real code.

Which leads me to the two rules of thumb:

  1. do what seems natural: if passing by copy makes your code simpler or cleaner, do that.
  2. if performance matters, then (1) make sure that what you're looking at actually has any noticeable impact on your performance at all. Measure it. If it affects performance, then it can be measured. If it can't be measured, then the difference in performance, by definition, cannot be noticeable.

So, in short:

  • for primitive types, pass by value.
  • for very large types, pass by reference.
  • for everything else, stop worrying and spend your time on something productive.
梦魇绽荼蘼 2025-01-14 19:39:15

您应该关注两件事 - 数据复制和堆栈使用。

数据复制需要时间。结构越大,复制它所需的时间就越长。是否是性能取决于您执行的频率以及代码的性能要求是什么。

堆栈很大,但不是无限的。按值传递大型结构,特别是与递归结合使用时,很容易导致溢出。

对于 x86_64(使用 WIN64 或 Linux 约定),在寄存器中传递数据的意义较小。如果每个参数最多 8 个字节,则前 6 个在寄存器中传递,这样速度更快。对于 x86,大多数约定都不会这样做(但是 Linux 内核使用 3 个寄存器作为参数)。
使用 reigsters 速度要快一些。但与复制 8 字节和 1000 字节之间的差异相比,使用堆栈或寄存器传递 8 字节之间的差异很小。

You should be concerned of two things - data copying and stack usage.

Data copying takes time. The larger the structure, the longer it will takes to copy it. Whether it's a performance or not depends on how often you do it, and what are the performance requirements of your code.

The stack is large, but isn't infinite. Passing large structures by value, especially if combined with recursion, can easily cause it to overflow.

With x86_64 (using the WIN64 or Linux conventions), there's the smaller point of passing data in registers. If parameters are up to 8 bytes each, the first 6 are passed in registers, which is faster. With x86, most conventions doesn't do it (the Linux kernel, however uses 3 registers for parameters).
Using reigsters is somewhat faster. But the difference between passing 8 bytes using stack or a register is small, compared to the difference between copying 8 bytes and 1000 bytes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文