使用 Visual Studio 2005 展开小循环

发布于 2024-08-03 05:16:05 字数 587 浏览 12 评论 0原文

如何告诉编译器根据迭代次数或其他属性展开循环？或者，如何在 Visual Studio 2005 中打开循环展开优化？

编辑：例如，

//Code Snippet 1
    vector<int> b;
    for(int i=0;i<3;++i) b.push_back(i);

与push_back()相反的

//Code Snippet 2
    vector<int> b;
    b.push_back(0);
    b.push_back(1);
    b.push_back(2);

是一个例子，我可以用任何可能需要很长时间的东西来替换它。

但我在某处读到，如果循环满足某些条件，我可以使用代码 1，编译器可以将其展开为代码 2。所以我的问题是：你是怎么做到的？已经有关于哪一个更有效的讨论，但无论如何，任何对此的评论都会受到赞赏。

原文

How do you tell the compiler to unroll loops based on the number of iterations or some other attribute? Or, how do you turn on loop unrolling optimization in Visual Studio 2005?

EDIT: E.g.

//Code Snippet 1
    vector<int> b;
    for(int i=0;i<3;++i) b.push_back(i);

As opposed to

//Code Snippet 2
    vector<int> b;
    b.push_back(0);
    b.push_back(1);
    b.push_back(2);

push_back() is an example, I could replace this with anything which can take a long time.

But I read somewhere that I can use Code 1 and the compiler can unroll it to Code 2 if the loop satisfies some criteria. So my question is: how do you do that? There's already a discussion on SO as to which one is more efficient but any comments on that is appreciated anyway.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

無處可尋 2024-08-10 05:16:05

通常你只需让编译器完成它的工作即可。如果在编译时已知循环数量，并且打开编译器优化，编译器将通过分支减少来平衡代码大小并展开任何不可展开的循环。

如果这确实不是您想要的，也可以使用 Duff 的设备自行完成：（来自维基百科）

send(to, from, count)
register short *to, *from;
register count;
{
    register n=(count+7)/8;
    switch(count%8){
    case 0: do{ *to = *from++;
    case 7:     *to = *from++;
    case 6:     *to = *from++;
    case 5:     *to = *from++;
    case 4:     *to = *from++;
    case 3:     *to = *from++;
    case 2:     *to = *from++;
    case 1:     *to = *from++;
        }while(--n>0);
    }
}

这使您可以使用运行时确定的迭代计数进行展开。

如果它仍然是您想要的编译时展开，并且内置优化不是您想要的（如果您想要更细粒度的控制），您可以创建一个 C++ 模板来执行您想要的操作。这是一个非常简单的模板应用程序，由于它都是在编译时完成的，因此您不会丢失任何函数内联或编译器可能另外执行的其他优化。

Usually you just let the compiler to its job. If the number of loops is known at compile-time, and compiler optimizations are turned on, the compiler will balance code-size with branch reduction and unroll any unrollable loops.

If that's really not what you want, there's also the possibility of doing it yourself with Duff's Device: (from wikipedia)

send(to, from, count)
register short *to, *from;
register count;
{
    register n=(count+7)/8;
    switch(count%8){
    case 0: do{ *to = *from++;
    case 7:     *to = *from++;
    case 6:     *to = *from++;
    case 5:     *to = *from++;
    case 4:     *to = *from++;
    case 3:     *to = *from++;
    case 2:     *to = *from++;
    case 1:     *to = *from++;
        }while(--n>0);
    }
}

This gives you unrolling with runtime determined iteration counts.

If it's still compile-time unrolling you want, and the built in optimizations aren't what you want (if you want finer-grained control), you could create a C++ template to do what you want. This is a pretty trivial template application, and since it is all done at compile time, you don't lose any function inlining or other optimizations that the compiler might do in addition.

回复收藏 0 原文

爱给你人给你 2024-08-10 05:16:05

它通常相当简单：“您启用优化”。

如果您告诉编译器优化您的代码，那么循环展开就是它尝试应用的众多优化之一。

但请记住，展开并不总是会产生更快的代码。它可能会导致缓存未命中（在数据和指令缓存中）。借助现代 CPU 中的高级分支预测，构成循环的分支的成本通常可以忽略不计。

有时，编译器可能会确定展开会产生较慢的代码，然后就不会这样做。

回复收藏 0 原文

江心雾 2024-08-10 05:16:05

循环展开不会神奇地使循环中执行的代码运行得更快。它所做的只是节省一些用于比较循环变量的 CPU 周期。因此，它仅在非常紧密的循环中才有意义，其中循环体本身几乎不执行任何操作。

关于您的示例：虽然 push_back() 需要摊销常数时间，但这确实包括偶尔的分配-复制-释放循环以及实际对象的复制。我非常怀疑循环中的比较与此相比是否发挥了重要作用。如果你用其他东西替换它需要很长时间，这同样适用。

当然，这在任何特定 CPU 上可能是错误的，但在任何其他 CPU 上可能是正确的。由于现代 CPU 架构的缓存、指令管道和分支预测方案的特性，在优化代码方面很难胜过编译器。您尝试通过展开它来优化具有“重”体的循环似乎暗示您对此了解不够，无法实现太多目标。（我很努力地这么说，这样你就不会被冒犯。我是第一个承认我自己在这个游戏中是个松手的人。）

如果你遇到性能问题，IME 中有 9 分是这样的消除愚蠢错误（例如复制复杂对象）以及优化算法和数据结构的 10 个案例是您应该关注的。

（如果你仍然认为你的问题属于十分之一的类别，那么试试英特尔的编译器。上次我看它时，你可以免费下载一个测试版本，它插入VS，设置起来非常容易，并在我测试的应用程序中带来了大约 0.5% 的速度增益。）

回复收藏 0 原文