我++ 效率低于 ++i,如何显示?
我试图通过例子来证明前缀增量比后缀增量更有效。
理论上这是有道理的:i++ 需要能够返回未递增的原始值并因此存储它,而 ++i 可以返回递增的值而不存储先前的值。
但在实践中是否有一个很好的例子来证明这一点?
我尝试了以下代码:
int array[100];
int main()
{
for(int i = 0; i < sizeof(array)/sizeof(*array); i++)
array[i] = 1;
}
我使用 gcc 4.4.0 编译它,如下所示:
gcc -Wa,-adhls -O0 myfile.cpp
我再次执行此操作,将后缀增量更改为前缀增量:
for(int i = 0; i < sizeof(array)/sizeof(*array); ++i)
在两种情况下,结果都是相同的汇编代码。
这有点出乎意料。 似乎通过关闭优化(使用 -O0),我应该看到展示这个概念的差异。 我缺少什么? 有更好的例子来说明这一点吗?
I am trying to show by example that the prefix increment is more efficient than the postfix increment.
In theory this makes sense: i++ needs to be able to return the unincremented original value and therefore store it, whereas ++i can return the incremented value without storing the previous value.
But is there a good example to show this in practice?
I tried the following code:
int array[100];
int main()
{
for(int i = 0; i < sizeof(array)/sizeof(*array); i++)
array[i] = 1;
}
I compiled it using gcc 4.4.0 like this:
gcc -Wa,-adhls -O0 myfile.cpp
I did this again, with the postfix increment changed to a prefix increment:
for(int i = 0; i < sizeof(array)/sizeof(*array); ++i)
The result is identical assembly code in both cases.
This was somewhat unexpected. It seemed like that by turning off optimizations (with -O0) I should see a difference to show the concept. What am I missing? Is there a better example to show this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
这段代码及其注释应该展示两者之间的差异。
您可以看到后缀有一个额外的步骤(步骤 1),其中涉及创建对象的副本。 这对内存消耗和运行时间都有影响。 这就是为什么前缀对于非基本类型比后缀更有效的原因。
根据
some_ridiculously_big_type
以及您对增量结果所做的任何操作,您将能够看到有或没有优化的差异。This code and its comments should demonstrate the differences between the two.
You can see that the postfix has an extra step (step 1) which involves creating a copy of the object. This has both implications for both memory consumption and runtime. That is why prefix is more efficient that postfix for non-basic types.
Depending on
some_ridiculously_big_type
and also on whatever you do with the result of the incrememt, you'll be able to see the difference either with or without optimizations.作为对 Mihail 的回应,这是他的代码的一个更便携的版本:
外部循环允许我调整时间以获得适合我的平台的东西。
我不再使用 VC++,所以我用以下命令编译它(在 Windows 上):
然后我通过交替运行它:
并且
对于这两种情况,我的计时结果大致相同。 有时一个版本会快 20%,有时另一个版本会快 20%。 我猜这是由于我的系统上运行的其他进程造成的。
In response to Mihail, this is a somewhat more portable version his code:
The outer loops are there to allow me to fiddle the timings to get something suitable on my platform.
I don't use VC++ any more, so i compiled it (on Windows) with:
I then ran it by alternating:
and
My timing results were approximately the same for both cases. Sometimes one version would be faster by up to 20% and sometimes the other. This I would guess is due to other processes running on my system.
尝试使用 while 或执行带有返回值的操作,例如:
使用 /O2 或 /Ox 使用 VS 2005 编译,在我的台式机和笔记本电脑上尝试过。
在笔记本电脑上稳定地得到一些东西,在台式机上数字有点不同(但速率大致相同):
xx 表示数字不同,例如 813 与 640 - 仍然有 20% 左右的加速。
还有一点 - 如果您将“d + =”替换为“d =”,您将看到很好的优化技巧:
但是,它非常具体。 但毕竟,我看不出有任何理由改变主意并认为没有什么区别:)
Try to use while or do something with returned value, e.g.:
Compiled with VS 2005 using /O2 or /Ox, tried on my desktop and on laptop.
Stably get something around on laptop, on desktop numbers are a bit different (but rate is about the same):
xx means that numbers are different e.g. 813 vs 640 - still around 20% speed up.
And one more point - if you replace "d +=" with "d = " you will see nice optimization trick:
However, it's quite specific. But after all, I don't see any reasons to change my mind and think there is no difference :)
也许您可以通过使用 x86 汇编指令写出两个版本来显示理论上的差异? 正如许多人之前指出的那样,编译器总是会自行决定如何最好地编译/汇编程序。
如果该示例是针对不熟悉 x86 指令集的学生,您可能会考虑使用 MIPS32 指令集——出于某种奇怪的原因,许多人似乎发现它比 x86 汇编更容易理解。
Perhaps you could just show the theoretical difference by writing out both versions with x86 assembly instructions? As many people have pointed out before, compiler will always make its own decisions on how best to compile/assemble the program.
If the example is meant for students not familiar with the x86 instruction set, you might consider using the MIPS32 instruction set -- for some odd reason many people seem to find it to be easier to comprehend than x86 assembly.
好吧,所有这些前缀/后缀“优化”只是......一些很大的误解。
i++ 的主要思想是返回其原始副本,因此需要复制该值。
对于某些低效的迭代器实现来说,这可能是正确的。 然而,在 99% 的情况下,即使使用 STL 迭代器,也没有区别,因为编译器知道如何优化它,而实际的迭代器只是看起来像类的指针。 当然,对于像指针上的整数这样的基本类型没有区别。
所以……忘了它吧。
编辑:澄清
正如我所提到的,大多数 STL迭代器类只是用类包装的指针,所有成员函数内联允许out-优化此类不相关的副本。
是的,如果您有自己的迭代器而没有内联成员函数,那么它可能
工作速度较慢。 但是,您应该了解编译器做什么和不做什么。
作为一个小证明,采用以下代码:
将其编译为汇编并比较 sum1 和 sum2、sum3 和 sum4...
我只能告诉你...gcc 给出与
-02
完全相同的代码。Ok, all this prefix/postfix "optimization" is just... some big misunderstanding.
The major idea that i++ returns its original copy and thus requires copying the value.
This may be correct for some unefficient implementations of iterators. However in 99% of cases even with STL iterators there is no difference because compiler knows how to optimize it and the actual iterators are just pointers that look like class. And of course there is no difference for primitive types like integers on pointers.
So... forget about it.
EDIT: Clearification
As I had mentioned, most of STL iterator classes are just pointers wrapped with classes, that have all member functions inlined allowing out-optimization of such irrelevant copy.
And yes, if you have your own iterators without inlined member functions, then it may
work slower. But, you should just understand what compiler does and what does not.
As a small prove, take this code:
Compile it to assembly and compare sum1 and sum2, sum3 and sum4...
I just can tell you... gcc give exactly the same code with
-02
.在一般情况下,后增量将产生一个副本,而前增量则不会。 当然,这在很多情况下都会被优化,而在不是这样的情况下,复制操作将可以忽略不计(即,对于内置类型)。
这是一个小例子,显示了后增量潜在的低效率。
优化构建的结果(由于 RVO,实际上删除了后增量情况下的第二次复制操作):
一般来说,如果您不需要后增量的语义,为什么要冒险进行不必要的复制会发生?
当然,最好记住自定义运算符 ++() - 无论是前变体还是后变体 - 可以自由地返回它想要的任何内容(甚至做任何它想做的事情),而且我想有很多不遵循通常的规则。 有时我会遇到返回“
void
”的实现,这使得通常的语义差异消失了。In the general case, the post increment will result in a copy where a pre-increment will not. Of course this will be optimized away in a large number of cases and in the cases where it isn't the copy operation will be negligible (ie., for built in types).
Here's a small example that show the potential inefficiency of post-increment.
The results from an optimized build (which actually removes a second copy operation in the post-increment case due to RVO):
In general, if you don't need the semantics of the post-increment, why take the chance that an unnecessary copy will occur?
Of course, it's good to keep in mind that a custom operator++() - either the pre or post variant - is free to return whatever it wants (or even do whatever it wants), and I'd imagine that there are quite a few that don't follow the usual rules. Occasionally I've come across implementations that return "
void
", which makes the usual semantic difference go away.您不会看到整数有任何区别。 您需要使用迭代器或 post 和 prefix 真正做不同事情的东西。 您需要打开所有优化,而不是关闭!
You won't see any difference with integers. You need to use iterators or something where post and prefix really do something different. And you need to turn all optimisations on, not off!
我喜欢遵循“说出你的意思”的规则。
++i
只是简单地递增。i++
增量和 具有特殊的、非直观的评估结果。 如果我明确想要这种行为,我仅使用i++
,并在所有其他情况下使用++i
。 如果您遵循这种做法,当您在代码中看到i++
时,很明显后增量行为确实是有意为之的。I like to follow the rule of "say what you mean".
++i
simply increments.i++
increments and has a special, non-intuitive result of evaluation. I only usei++
if I explicitly want that behavior, and use++i
in all other cases. If you follow this practice, when you do seei++
in code, it's obvious that post-increment behavior really was intended.几点:
如果您想显示差异,最简单的选择就是简单地实现两个运算符,并指出一个运算符需要额外的副本,另一个则不需要。
Several points:
If you want to show the difference, the simplest option is simply to impement both operators, and point out that one requires an extra copy, the other does not.