访问类成员时的性能
我正在编写一些对性能至关重要的东西,并且想知道如果我使用以下内容是否会产生影响:
int test( int a, int b, int c )
{
// Do millions of calculations with a, b, c
}
或者
class myStorage
{
public:
int a, b, c;
};
int test( myStorage values )
{
// Do millions of calculations with values.a, values.b, values.c
}
- 这基本上会产生类似的代码吗?访问类成员是否有额外的开销?
我确信 C++ 专家对此很清楚,所以我现在不会尝试为其编写一个不切实际的基准测试
I'm writing something performance-critical and wanted to know if it could make a difference if I use:
int test( int a, int b, int c )
{
// Do millions of calculations with a, b, c
}
or
class myStorage
{
public:
int a, b, c;
};
int test( myStorage values )
{
// Do millions of calculations with values.a, values.b, values.c
}
- Does this basically result in similar code? Is there an extra overhead of accessing the class members?
I'm sure that this is clear to an expert in C++ so I won't try and write an unrealistic benchmark for it right now
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
编译器可能会均衡它们。如果它有脑子的话,它会将
values.a
、values.b
和values.c
复制到局部变量或寄存器中,这也是简单情况下发生的情况。相关格言:
过早优化是万恶之源。
写下它,以便您可以在六个月后的凌晨 1 点阅读它,并且仍然了解您想要做什么。
写下它,以便
大多数时候,重要的优化来自于重构算法,而不是变量访问方式的微小变化。是的,我知道有例外,但这可能不是其中之一。
The compiler will probably equalize them. If it has any brains at all, it will copy
values.a
,values.b
, andvalues.c
into local variables or registers, which is also what happens in the simple case.The relevant maxims:
Premature optimization is the root of much evil.
Write it so you can read it at 1am six months from now and still understand what you were trying to do.
Most of the time significant optimization comes from restructuring your algorithm, not small changes in how variables are accessed. Yes, I know there are exceptions, but this probably isn't one of them.
这听起来像是过早的优化。
话虽如此,存在一些差异和机会,但它们会影响对函数的多次调用,而不是影响函数的性能。
首先,在第二个选项中,您可能希望将 MyStorage 作为常量引用传递。
因此,您编译的代码可能会将单个值推入堆栈(以允许您访问容器),而不是推入三个单独的值。如果您有其他字段(除了 ac 之外),不将 MyStorage 作为引用发送实际上可能会花费更多,因为您将调用复制构造函数并实质上复制所有其他字段。所有这些都是每次调用的成本,而不是函数内的成本。
如果您在函数中使用 ab 和 c 进行大量计算,那么如何传输或访问它们并不重要。如果您通过引用传递,初始成本可能会稍高一些(因为您的对象,如果通过引用传递,可能位于堆上而不是堆栈上),但是一旦第一次访问,机器上的缓存和寄存器可能会意味着低成本的访问。如果您按值传递了对象,那么这实际上并不重要,因为即使最初,这些值也将位于堆栈附近。
对于您提供的代码,如果这些是唯一的字段,则可能不会有差异。 “values.variable”仅解释为堆栈中的偏移量,而不是“查找一个对象,然后访问另一个地址”。
当然,如果您不购买这些参数,只需在函数中定义局部变量作为第一步,从对象中复制值,然后使用这些变量即可。如果您确实多次使用它们,则此副本的初始成本并不重要:)
This sounds like premature optimization.
That being said, there are some differences and opportunities but they will affect multiple calls to the function rather than performance in the function.
First of all, in the second option you may want to pass MyStorage as a constant reference.
As a result of that, your compiled code will likely be pushing a single value into the stack (to allow you to access the container), rather than pushing three separate values. If you have additional fields (in addition to a-c), sending MyStorage not as a reference might actually cost you more because you will be invoking a copy constructor and essentially copying all the additional fields. All of this would be costs per-call, not within the function.
If you are doing tons of calculations with a b and c within the function, then it really doesn't matter how you transfer or access them. If you passed by reference, the initial cost might be slightly more (since your object, if passed by reference, could be on the heap rather than the stack), but once accessed for the first time, caching and registers on your machine will probably mean low-cost access. If you have passed your object by value, then it really doesn't matter, since even initially, the values will be nearby on the stack.
For the code you provided, if these are the only fields, there will likely not be a difference. the "values.variable" is merely interpreted as an offset in the stack, not as "lookup one object, then access another address".
Of course, if you don't buy these arguments, just define local variables as the first step in your function, copy the values from the object, and then use these variables. If you realy use them multiple times, the initial cost of this copy wouldn't matter :)
不,你的CPU会缓存你一遍又一遍使用的变量。
No, your cpu would cache the variables you use over and over again.
我认为有一些开销,但可能不多。因为对象的内存地址会保存在栈中,指向堆内存对象,然后访问实例变量。
如果将变量 int 存储在堆栈中,速度会更快,因为该值已经在堆栈中,机器只需前往堆栈将其取出来进行计算:)。
它还取决于您是否将类的实例变量值存储在堆栈上。如果在 test() 内部,您确实喜欢:
我认为性能几乎相同
I think there are some overhead, but may not be much. Because the memory address of the object will be stored in the stack, which points to the heap memory object, then you access the instance variable.
If you store the variable int in stack, it would be really faster, because the value is already in stack and the machine just go to stack to get it out to calculate:).
It also depends on if you store the class's instance variable value on stack or not. If inside the test(), you do like:
I think it would be almost the same performance
如果您确实正在编写对性能至关重要的代码,并且您认为一个版本应该比另一个版本更快,请编写两个版本并测试时序(使用使用正确的优化开关编译的代码)。您甚至可能想查看生成的汇编代码。很多事情都会影响代码片段的速度,这些因素非常微妙,例如寄存器溢出等。
If you're really writing performance critical code and you think one version should be faster than the other one, write both versions and test the timing (with the code compiled with right optimization switch). You may even want to see the generated assembly codes. A lot of things can affect the speed of a code snippets that are quite subtle, like register spilling, etc.
您也可以启动您的函数,
尽管编译器应该足够智能,可以在幕后为您执行此操作。一般来说,我更喜欢传递结构或类,这通常使函数的用途更加清晰,而且您不必每次想要考虑另一个参数时都更改签名。
you can also start your function with
although the compiler should be smart enough to do that for you behind the scenes. In general I prefer to pass around structures or classes, this makes it often clearer what the function is meant to do, plus you don't have to change the signatures every time you want to take another parameter into account.
与您之前的类似问题一样:这取决于编译器和平台。如果有任何差异,那也是非常小的。
堆栈上的值和对象中的值通常都使用指针(堆栈指针或 this 指针)和某个偏移量(函数堆栈帧中的位置或函数内部的位置)来访问。班级)。
在某些情况下,它可能会产生影响:
根据您的平台,堆栈指针可能保存在 CPU 寄存器中,而
this
指针可能不会。如果是这种情况,访问this
(可能在堆栈上)将需要额外的内存查找。内存位置可能不同。如果内存中的对象大于一个缓存行,则字段将分布在多个缓存行上。仅将相关值放在堆栈帧中可能会提高缓存效率。
但请注意,我在这里使用“可能”这个词的频率。唯一确定的方法就是测量它。
As with your previous, similar question: it depends on the compiler and platform. If there is any difference at all, it will be very small.
Both values on the stack and values in an object are commonly accessed using a pointer (the stack pointer, or the
this
pointer) and some offset (the location in the function's stack frame, or the location inside the class).Here are some cases where it might make a difference:
Depending on your platform, the stack pointer might be held in a CPU register, whereas the
this
pointer might not. If this is the case, accessingthis
(which is presumably on the stack) would require an extra memory lookup.Memory locality might be different. If the object in memory is larger than one cache line, the fields are spread out over multiple cache lines. Bringing only the relevant values together in a stack frame might improve cache efficiency.
Do note, however, how often I used the word "might" here. The only way to be sure is to measure it.
如果您无法分析程序,请打印出代码片段的汇编语言。
一般来说,更少的汇编代码意味着更少的执行指令,从而提高性能。这是一种在分析器不可用时粗略估计性能的技术。
汇编语言列表将允许您查看实现之间的差异(如果有)。
If you can't profile the program, print out the assembly language for the code fragments.
In general, less assembly code means less instructions to execute which speeds up performance. This is a technique for getting a rough estimate of performance when a profiler is not available.
An assembly language listing will allow you to see differences, if any, between implementations.