我++ 效率低于 ++i,如何显示?

发布于 2024-07-27 05:36:32 字数 596 浏览 6 评论 0原文

我试图通过例子来证明前缀增量比后缀增量更有效。

理论上这是有道理的:i++ 需要能够返回未递增的原始值并因此存储它,而 ++i 可以返回递增的值而不存储先前的值。

但在实践中是否有一个很好的例子来证明这一点?

我尝试了以下代码:

int array[100];

int main()
{
  for(int i = 0; i < sizeof(array)/sizeof(*array); i++)
    array[i] = 1;
}

我使用 gcc 4.4.0 编译它,如下所示:

gcc -Wa,-adhls -O0 myfile.cpp

我再次执行此操作,将后缀增量更改为前缀增量:

for(int i = 0; i < sizeof(array)/sizeof(*array); ++i)

在两种情况下,结果都是相同的汇编代码。

这有点出乎意料。 似乎通过关闭优化(使用 -O0),我应该看到展示这个概念的差异。 我缺少什么? 有更好的例子来说明这一点吗?

I am trying to show by example that the prefix increment is more efficient than the postfix increment.

In theory this makes sense: i++ needs to be able to return the unincremented original value and therefore store it, whereas ++i can return the incremented value without storing the previous value.

But is there a good example to show this in practice?

I tried the following code:

int array[100];

int main()
{
  for(int i = 0; i < sizeof(array)/sizeof(*array); i++)
    array[i] = 1;
}

I compiled it using gcc 4.4.0 like this:

gcc -Wa,-adhls -O0 myfile.cpp

I did this again, with the postfix increment changed to a prefix increment:

for(int i = 0; i < sizeof(array)/sizeof(*array); ++i)

The result is identical assembly code in both cases.

This was somewhat unexpected. It seemed like that by turning off optimizations (with -O0) I should see a difference to show the concept. What am I missing? Is there a better example to show this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

眼眸里的快感 2024-08-03 05:36:33

这段代码及其注释应该展示两者之间的差异。

class a {
    int index;
    some_ridiculously_big_type big;

    //etc...

};

// prefix ++a
void operator++ (a& _a) {
    ++_a.index
}

// postfix a++
void operator++ (a& _a, int b) {
    _a.index++;
}

// now the program
int main (void) {
    a my_a;

    // prefix:
    // 1. updates my_a.index
    // 2. copies my_a.index to b
    int b = (++my_a).index; 

    // postfix
    // 1. creates a copy of my_a, including the *big* member.
    // 2. updates my_a.index
    // 3. copies index out of the **copy** of my_a that was created in step 1
    int c = (my_a++).index; 
}

您可以看到后缀有一个额外的步骤(步骤 1),其中涉及创建对象的副本。 这对内存消耗和运行时间都有影响。 就是为什么前缀对于非基本类型比后缀更有效的原因。

根据 some_ridiculously_big_type 以及您对增量结果所做的任何操作,您将能够看到有或没有优化的差异。

This code and its comments should demonstrate the differences between the two.

class a {
    int index;
    some_ridiculously_big_type big;

    //etc...

};

// prefix ++a
void operator++ (a& _a) {
    ++_a.index
}

// postfix a++
void operator++ (a& _a, int b) {
    _a.index++;
}

// now the program
int main (void) {
    a my_a;

    // prefix:
    // 1. updates my_a.index
    // 2. copies my_a.index to b
    int b = (++my_a).index; 

    // postfix
    // 1. creates a copy of my_a, including the *big* member.
    // 2. updates my_a.index
    // 3. copies index out of the **copy** of my_a that was created in step 1
    int c = (my_a++).index; 
}

You can see that the postfix has an extra step (step 1) which involves creating a copy of the object. This has both implications for both memory consumption and runtime. That is why prefix is more efficient that postfix for non-basic types.

Depending on some_ridiculously_big_type and also on whatever you do with the result of the incrememt, you'll be able to see the difference either with or without optimizations.

遮云壑 2024-08-03 05:36:33

作为对 Mihail 的回应,这是他的代码的一个更便携的版本:

#include <cstdio>
#include <ctime>
using namespace std;

#define SOME_BIG_CONSTANT 100000000
#define OUTER 40
int main( int argc, char * argv[] ) {

    int d = 0;
    time_t now = time(0);
    if ( argc == 1 ) {
        for ( int n = 0; n < OUTER; n++ ) {
            int i = 0;
            while(i < SOME_BIG_CONSTANT) {
                d += i++;
            }
        }
    }
    else {
        for ( int n = 0; n < OUTER; n++ ) {
            int i = 0;
            while(i < SOME_BIG_CONSTANT) {
                d += ++i;
            }
        }
    }
    int t = time(0) - now;  
    printf( "%d\n", t );
    return d % 2;
}

外部循环允许我调整时间以获得适合我的平台的东西。

我不再使用 VC++,所以我用以下命令编译它(在 Windows 上):

g++ -O3 t.cpp

然后我通过交替运行它:

a.exe   

并且

a.exe 1

对于这两种情况,我的计时结果大致相同。 有时一个版本会快 20%,有时另一个版本会快 20%。 我猜这是由于我的系统上运行的其他进程造成的。

In response to Mihail, this is a somewhat more portable version his code:

#include <cstdio>
#include <ctime>
using namespace std;

#define SOME_BIG_CONSTANT 100000000
#define OUTER 40
int main( int argc, char * argv[] ) {

    int d = 0;
    time_t now = time(0);
    if ( argc == 1 ) {
        for ( int n = 0; n < OUTER; n++ ) {
            int i = 0;
            while(i < SOME_BIG_CONSTANT) {
                d += i++;
            }
        }
    }
    else {
        for ( int n = 0; n < OUTER; n++ ) {
            int i = 0;
            while(i < SOME_BIG_CONSTANT) {
                d += ++i;
            }
        }
    }
    int t = time(0) - now;  
    printf( "%d\n", t );
    return d % 2;
}

The outer loops are there to allow me to fiddle the timings to get something suitable on my platform.

I don't use VC++ any more, so i compiled it (on Windows) with:

g++ -O3 t.cpp

I then ran it by alternating:

a.exe   

and

a.exe 1

My timing results were approximately the same for both cases. Sometimes one version would be faster by up to 20% and sometimes the other. This I would guess is due to other processes running on my system.

三生池水覆流年 2024-08-03 05:36:33

尝试使用 while 或执行带有返回值的操作,例如:

#define SOME_BIG_CONSTANT 1000000000

int _tmain(int argc, _TCHAR* argv[])
{
    int i = 1;
    int d = 0;

    DWORD d1 = GetTickCount();
    while(i < SOME_BIG_CONSTANT + 1)
    {
        d += i++;
    }
    DWORD t1 = GetTickCount() - d1;

    printf("%d", d);
    printf("\ni++ > %d <\n", t1);

    i = 0;
    d = 0;

    d1 = GetTickCount();
    while(i < SOME_BIG_CONSTANT)
    {
        d += ++i;

    }
    t1 = GetTickCount() - d1;

    printf("%d", d);
    printf("\n++i > %d <\n", t1);

    return 0;
}

使用 /O2 或 /Ox 使用 VS 2005 编译,在我的台式机和笔记本电脑上尝试过。

在笔记本电脑上稳定地得到一些东西,在台式机上数字有点不同(但速率大致相同):

i++ > 8xx < 
++i > 6xx <

xx 表示数字不同,例如 813 与 640 - 仍然有 20% 左右的加速。

还有一点 - 如果您将“d + =”替换为“d =”,您将看到很好的优化技巧:

i++ > 935 <
++i > 0 <

但是,它非常具体。 但毕竟,我看不出有任何理由改变主意并认为没有什么区别:)

Try to use while or do something with returned value, e.g.:

#define SOME_BIG_CONSTANT 1000000000

int _tmain(int argc, _TCHAR* argv[])
{
    int i = 1;
    int d = 0;

    DWORD d1 = GetTickCount();
    while(i < SOME_BIG_CONSTANT + 1)
    {
        d += i++;
    }
    DWORD t1 = GetTickCount() - d1;

    printf("%d", d);
    printf("\ni++ > %d <\n", t1);

    i = 0;
    d = 0;

    d1 = GetTickCount();
    while(i < SOME_BIG_CONSTANT)
    {
        d += ++i;

    }
    t1 = GetTickCount() - d1;

    printf("%d", d);
    printf("\n++i > %d <\n", t1);

    return 0;
}

Compiled with VS 2005 using /O2 or /Ox, tried on my desktop and on laptop.

Stably get something around on laptop, on desktop numbers are a bit different (but rate is about the same):

i++ > 8xx < 
++i > 6xx <

xx means that numbers are different e.g. 813 vs 640 - still around 20% speed up.

And one more point - if you replace "d +=" with "d = " you will see nice optimization trick:

i++ > 935 <
++i > 0 <

However, it's quite specific. But after all, I don't see any reasons to change my mind and think there is no difference :)

眼泪淡了忧伤 2024-08-03 05:36:33

也许您可以通过使用 x86 汇编指令写出两个版本来显示理论上的差异? 正如许多人之前指出的那样,编译器总是会自行决定如何最好地编译/汇编程序。

如果该示例是针对不熟悉 x86 指令集的学生,您可能会考虑使用 MIPS32 指令集——出于某种奇怪的原因,许多人似乎发现它比 x86 汇编更容易理解。

Perhaps you could just show the theoretical difference by writing out both versions with x86 assembly instructions? As many people have pointed out before, compiler will always make its own decisions on how best to compile/assemble the program.

If the example is meant for students not familiar with the x86 instruction set, you might consider using the MIPS32 instruction set -- for some odd reason many people seem to find it to be easier to comprehend than x86 assembly.

鹿! 2024-08-03 05:36:33

好吧,所有这些前缀/后缀“优化”只是......一些很大的误解。

i++ 的主要思想是返回其原始副本,因此需要复制该值。

对于某些低效的迭代器实现来说,这可能是正确的。 然而,在 99% 的情况下,即使使用 STL 迭代器,也没有区别,因为编译器知道如何优化它,而实际的迭代器只是看起来像类的指针。 当然,对于像指针上的整数这样的基本类型没有区别。

所以……忘了它吧。

编辑:澄清

正如我所提到的,大多数 STL迭代器类只是用类包装的指针,所有成员函数内联允许out-优化此类不相关的副本。

是的,如果您有自己的迭代器而没有内联成员函数,那么它可能
工作速度较慢。 但是,您应该了解编译器做什么和不做什么。

作为一个小证明,采用以下代码:

int sum1(vector<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();x++)
            n+=*x;
    return n;
}

int sum2(vector<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();++x)
            n+=*x;
    return n;
}

int sum3(set<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();x++)
            n+=*x;
    return n;
}

int sum4(set<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();++x)
            n+=*x;
    return n;
}

将其编译为汇编并比较 sum1 和 sum2、sum3 和 sum4...

我只能告诉你...gcc 给出与 -02 完全相同的代码。

Ok, all this prefix/postfix "optimization" is just... some big misunderstanding.

The major idea that i++ returns its original copy and thus requires copying the value.

This may be correct for some unefficient implementations of iterators. However in 99% of cases even with STL iterators there is no difference because compiler knows how to optimize it and the actual iterators are just pointers that look like class. And of course there is no difference for primitive types like integers on pointers.

So... forget about it.

EDIT: Clearification

As I had mentioned, most of STL iterator classes are just pointers wrapped with classes, that have all member functions inlined allowing out-optimization of such irrelevant copy.

And yes, if you have your own iterators without inlined member functions, then it may
work slower. But, you should just understand what compiler does and what does not.

As a small prove, take this code:

int sum1(vector<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();x++)
            n+=*x;
    return n;
}

int sum2(vector<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();++x)
            n+=*x;
    return n;
}

int sum3(set<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();x++)
            n+=*x;
    return n;
}

int sum4(set<int> const &v)
{
    int n;
    for(auto x=v.begin();x!=v.end();++x)
            n+=*x;
    return n;
}

Compile it to assembly and compare sum1 and sum2, sum3 and sum4...

I just can tell you... gcc give exactly the same code with -02.

衣神在巴黎 2024-08-03 05:36:32

一般情况下,后增量将产生一个副本,而前增量则不会。 当然,这在很多情况下都会被优化,而在不是这样的情况下,复制操作将可以忽略不计(即,对于内置类型)。

这是一个小例子,显示了后增量潜在的低效率。

#include <stdio.h>

class foo 
{

public:
    int x;

    foo() : x(0) { 
        printf( "construct foo()\n"); 
    };

    foo( foo const& other) { 
        printf( "copy foo()\n"); 
        x = other.x; 
    };

    foo& operator=( foo const& rhs) { 
        printf( "assign foo()\n"); 
        x = rhs.x;
        return *this; 
    };

    foo& operator++() { 
        printf( "preincrement foo\n"); 
        ++x; 
        return *this; 
    };

    foo operator++( int) { 
        printf( "postincrement foo\n"); 
        foo temp( *this);
        ++x;
        return temp; 
    };

};


int main()
{
    foo bar;

    printf( "\n" "preinc example: \n");
    ++bar;

    printf( "\n" "postinc example: \n");
    bar++;
}

优化构建的结果(由于 RVO,实际上删除了后增量情况下的第二次复制操作):

construct foo()

preinc example: 
preincrement foo

postinc example: 
postincrement foo
copy foo()

一般来说,如果您不需要后增量的语义,为什么要冒险进行不必要的复制会发生?

当然,最好记住自定义运算符 ++() - 无论是前变体还是后变体 - 可以自由地返回它想要的任何内容(甚至做任何它想做的事情),而且我想有很多不遵循通常的规则。 有时我会遇到返回“void”的实现,这使得通常的语义差异消失了。

In the general case, the post increment will result in a copy where a pre-increment will not. Of course this will be optimized away in a large number of cases and in the cases where it isn't the copy operation will be negligible (ie., for built in types).

Here's a small example that show the potential inefficiency of post-increment.

#include <stdio.h>

class foo 
{

public:
    int x;

    foo() : x(0) { 
        printf( "construct foo()\n"); 
    };

    foo( foo const& other) { 
        printf( "copy foo()\n"); 
        x = other.x; 
    };

    foo& operator=( foo const& rhs) { 
        printf( "assign foo()\n"); 
        x = rhs.x;
        return *this; 
    };

    foo& operator++() { 
        printf( "preincrement foo\n"); 
        ++x; 
        return *this; 
    };

    foo operator++( int) { 
        printf( "postincrement foo\n"); 
        foo temp( *this);
        ++x;
        return temp; 
    };

};


int main()
{
    foo bar;

    printf( "\n" "preinc example: \n");
    ++bar;

    printf( "\n" "postinc example: \n");
    bar++;
}

The results from an optimized build (which actually removes a second copy operation in the post-increment case due to RVO):

construct foo()

preinc example: 
preincrement foo

postinc example: 
postincrement foo
copy foo()

In general, if you don't need the semantics of the post-increment, why take the chance that an unnecessary copy will occur?

Of course, it's good to keep in mind that a custom operator++() - either the pre or post variant - is free to return whatever it wants (or even do whatever it wants), and I'd imagine that there are quite a few that don't follow the usual rules. Occasionally I've come across implementations that return "void", which makes the usual semantic difference go away.

与往事干杯 2024-08-03 05:36:32

您不会看到整数有任何区别。 您需要使用迭代器或 post 和 prefix 真正做不同事情的东西。 您需要打开所有优化,而不是关闭!

You won't see any difference with integers. You need to use iterators or something where post and prefix really do something different. And you need to turn all optimisations on, not off!

清眉祭 2024-08-03 05:36:32

我喜欢遵循“说出你的意思”的规则。

++i 只是简单地递增。 i++ 增量 具有特殊的、非直观的评估结果。 如果我明确想要这种行为,我仅使用 i++ ,并在所有其他情况下使用 ++i 。 如果您遵循这种做法,当您在代码中看到 i++ 时,很明显后增量行为确实是有意为之的。

I like to follow the rule of "say what you mean".

++i simply increments. i++ increments and has a special, non-intuitive result of evaluation. I only use i++ if I explicitly want that behavior, and use ++i in all other cases. If you follow this practice, when you do see i++ in code, it's obvious that post-increment behavior really was intended.

甜心小果奶 2024-08-03 05:36:32

几点:

  • 首先,您不太可能以任何方式看到主要的性能差异。
  • 其次,如果禁用优化,您的基准测试将毫无用处。 我们想知道的是,这种更改是否会为我们提供更高效或更低效的代码,这意味着我们必须将其与编译器能够生成的最高效的代码一起使用。 我们不关心它在未优化的构建中是否更快,我们需要知道它在优化的构建中是否更快。
  • 对于像整数这样的内置数据类型,编译器通常能够优化掉差异。 该问题主要发生在具有重载增量迭代器的更复杂类型上,其中编译器无法轻松地看到这两个操作在上下文中是等效的。
  • 您应该使用最清楚地表达您的意图的代码。 您想要“为值加一”,还是“为值加一,但继续处理原始值更长一点”? 通常情况下,是前者,然后预增量更好地表达了您的意图。

如果您想显示差异,最简单的选择就是简单地实现两个运算符,并指出一个运算符需要额外的副本,另一个则不需要。

Several points:

  • First, you're unlikely to see a major performance difference in any way
  • Second, your benchmarking is useless if you have optimizations disabled. What we want to know is if this change gives us more or less efficient code, which means that we have to use it with the most efficient code the compiler is able to produce. We don't care whether it is faster in unoptimized builds, we need to know if it is faster in optimized ones.
  • For built-in datatypes like integers, the compiler is generally able to optimize the difference away. The problem mainly occurs for more complex types with overloaded increment iterators, where the compiler can't trivially see that the two operations would be equivalent in the context.
  • You should use the code that clearest expresses your intent. Do you want to "add one to the value", or "add one to the value, but keep working on the original value a bit longer"? Usually, the former is the case, and then a pre-increment better expresses your intent.

If you want to show the difference, the simplest option is simply to impement both operators, and point out that one requires an extra copy, the other does not.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文