使用 Visual Studio 2005 展开小循环

发布于 2024-08-03 05:16:05 字数 587 浏览 12 评论 0原文

如何告诉编译器根据迭代次数或其他属性展开循环?或者,如何在 Visual Studio 2005 中打开循环展开优化

编辑:例如,

//Code Snippet 1
    vector<int> b;
    for(int i=0;i<3;++i) b.push_back(i);

与push_back()相反的

//Code Snippet 2
    vector<int> b;
    b.push_back(0);
    b.push_back(1);
    b.push_back(2);

是一个例子,我可以用任何可能需要很长时间的东西来替换它。

但我在某处读到,如果循环满足某些条件,我可以使用代码 1,编译器可以将其展开为代码 2。所以我的问题是:你是怎么做到的?已经有关于哪一个更有效的讨论,但无论如何,任何对此的评论都会受到赞赏。

How do you tell the compiler to unroll loops based on the number of iterations or some other attribute? Or, how do you turn on loop unrolling optimization in Visual Studio 2005?

EDIT: E.g.

//Code Snippet 1
    vector<int> b;
    for(int i=0;i<3;++i) b.push_back(i);

As opposed to

//Code Snippet 2
    vector<int> b;
    b.push_back(0);
    b.push_back(1);
    b.push_back(2);

push_back() is an example, I could replace this with anything which can take a long time.

But I read somewhere that I can use Code 1 and the compiler can unroll it to Code 2 if the loop satisfies some criteria. So my question is: how do you do that? There's already a discussion on SO as to which one is more efficient but any comments on that is appreciated anyway.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

無處可尋 2024-08-10 05:16:05

通常你只需让编译器完成它的工作即可。如果在编译时已知循环数量,并且打开编译器优化,编译器将通过分支减少来平衡代码大小并展开任何不可展开的循环。

如果这确实不是您想要的,也可以使用 Duff 的设备自行完成:(来自维基百科)

send(to, from, count)
register short *to, *from;
register count;
{
    register n=(count+7)/8;
    switch(count%8){
    case 0: do{ *to = *from++;
    case 7:     *to = *from++;
    case 6:     *to = *from++;
    case 5:     *to = *from++;
    case 4:     *to = *from++;
    case 3:     *to = *from++;
    case 2:     *to = *from++;
    case 1:     *to = *from++;
        }while(--n>0);
    }
}

这使您可以使用运行时确定的迭代计数进行展开。

如果它仍然是您想要的编译时展开,并且内置优化不是您想要的(如果您想要更细粒度的控制),您可以创建一个 C++ 模板来执行您想要的操作。这是一个非常简单的模板应用程序,由于它都是在编译时完成的,因此您不会丢失任何函数内联或编译器可能另外执行的其他优化。

Usually you just let the compiler to its job. If the number of loops is known at compile-time, and compiler optimizations are turned on, the compiler will balance code-size with branch reduction and unroll any unrollable loops.

If that's really not what you want, there's also the possibility of doing it yourself with Duff's Device: (from wikipedia)

send(to, from, count)
register short *to, *from;
register count;
{
    register n=(count+7)/8;
    switch(count%8){
    case 0: do{ *to = *from++;
    case 7:     *to = *from++;
    case 6:     *to = *from++;
    case 5:     *to = *from++;
    case 4:     *to = *from++;
    case 3:     *to = *from++;
    case 2:     *to = *from++;
    case 1:     *to = *from++;
        }while(--n>0);
    }
}

This gives you unrolling with runtime determined iteration counts.

If it's still compile-time unrolling you want, and the built in optimizations aren't what you want (if you want finer-grained control), you could create a C++ template to do what you want. This is a pretty trivial template application, and since it is all done at compile time, you don't lose any function inlining or other optimizations that the compiler might do in addition.

爱给你人给你 2024-08-10 05:16:05

它通常相当简单:“您启用优化”。

如果您告诉编译器优化您的代码,那么循环展开就是它尝试应用的众多优化之一。

但请记住,展开并不总是会产生更快的代码。它可能会导致缓存未命中(在数据和指令缓存中)。借助现代 CPU 中的高级分支预测,构成循环的分支的成本通常可以忽略不计。

有时,编译器可能会确定展开会产生较慢的代码,然后就不会这样做。

It's generally fairly simple: "You enable optimizations".

If you tell the compiler to optimize your code, then loop unrolling is one of the many optimizations it tries to apply.

Keep in mind though, that unrolling is not always going to produce faster code. It might cause cache misses (in both data and instruction cache). And with the advanced branch prediction found in modern CPU's, the costs of the branches that make up a loop is often negligible.

Sometimes, the compiler may determine that unrolling would produce slower code, and then it won't do it.

江心雾 2024-08-10 05:16:05

循环展开不会神奇地使循环中执行的代码运行得更快。它所做的只是节省一些用于比较循环变量的 CPU 周期。因此,它仅在非常紧密的循环中才有意义,其中循环体本身几乎不执行任何操作。

关于您的示例:虽然 push_back() 需要摊销常数时间,但这确实包括偶尔的分配-复制-释放循环以及实际对象的复制。我非常怀疑循环中的比较与此相比是否发挥了重要作用。如果你用其他东西替换它需要很长时间,这同样适用。

当然,这在任何特定 CPU 上可能是错误的,但在任何其他 CPU 上可能是正确的。由于现代 CPU 架构的缓存、指令管道和分支预测方案的特性,在优化代码方面很难胜过编译器。您尝试通过展开它来优化具有“重”体的循环似乎暗示您对此了解不够,无法实现太多目标。 (我很努力地这么说,这样你就不会被冒犯。我是第一个承认我自己在这个游戏中是个松手的人。)

如果你遇到性能问题,IME 中有 9 分是这样的消除愚蠢错误(例如复制复杂对象)以及优化算法和数据结构的 10 个案例是您应该关注的。

(如果你仍然认为你的问题属于十分之一的类别,那么试试英特尔的编译器。上次我看它时,你可以免费下载一个测试版本,它插入VS,设置起来非常容易,并在我测试的应用程序中带来了大约 0.5% 的速度增益。)

Loop unrolling will not magically make the code executed in the loop run faster. All it does is to save a few CPU cycles used for comparing the loop variable. So it only makes sense in very tight loops where the loop body itself does next to nothing.

Regarding your example: While push_back() takes amortized constant time, this does include the occasional allocate-copy-deallocate cycle plus the copying of the actual objects. I very much doubt that the comparisons in the loop play a significant role compared to that. And if you replace it with anything else taking a long time, the same applies.

Of course, this could be wrong on any specific CPU and right on any other. With the idiosyncrasies of modern CPU architectures with their caches, instruction pipelines and branch prediction schemes it has become very hard to outsmart the compiler in optimizing code. That you would attempt to optimize a loop with a "heavy" body by unrolling it seems to be a hint that you don't know enough to achieve much in this. (I'm trying hard to say this so you won't be offended. I'm the first to admit that I'm a looser in this game myself.)

If you're having problems with performance, IME in 9 out of 10 cases eliminating silly errors (like copying complex objects) and optimizing algorithms and data structures is what you should look at.

(If you still believe your problem falls into the 1-out-of-10 category, then try Intel's compiler. The last time I looked at it you could download a test version for free, it plugged into VS, was very easy to setup, and brought about 0.5% of speed gain in the application I tested it in.)

夜吻♂芭芘 2024-08-10 05:16:05

请注意,您说:

push_back() 是一个例子,我可以用任何可能需要很长时间的东西来替换它。

事实上,如果push_back()(或任何你替换它的东西)需要很长时间,那么循环展开将是浪费精力。循环通常不会特别慢;循环展开有意义的时候是循环内完成的工作非常小的时候 - 在这种情况下,循环结构可能开始主导该执行部分的处理。

我确信您会得到许多其他答案 - 不要担心这种类型的事情,除非您确实发现它是一个瓶颈。 99%的情况下,都不会。

Note that you say:

push_back() is an example, I could replace this with anything which can take a long time.

In fact, if push_back() (or whatever you replace it with) takes a long time, that's a situation where loop unrolling would be a waste of effort. Looping generally isn't particularly slow; the times where loop unrolling makes sense is where the work done inside the loop is very small - in that case the looping constructs might start to dominate the processing of that stretch of execution.

As I'm sure you'll get in many other answers - don't worry about this type of thing unless you actually find that it's a bottleneck. 99% of the time, it won't be.

随波逐流 2024-08-10 05:16:05

右键单击该项目,选择属性并导航:
替代文本 http://img200.imageshack.us/img200/8685/propsm.jpg< /a>

WRT 循环展开,请注意,人们普遍认为,MS Visual Studio 针对大小而非速度进行优化实际上会因缓存命中/未命中而产生更快的代码。

Right click on the project, select properties and navigate:
alt text http://img200.imageshack.us/img200/8685/propsm.jpg

WRT loop unrolling, note that it's generally accepted that with MS Visual Studio optimizing for size rather than speed actually produces faster code due to cache hits/misses.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文