使用展开循环进行矢量化

发布于 2024-11-18 07:16:50 字数 490 浏览 8 评论 0原文

我正在使用 intel-cc 编译一些 C++ 代码，并且使用 -Wall 选项，它似乎对我的很多循环进行了矢量化。我目前的假设是这对性能有好处。

现在我的问题是这样的； if 而不是 for 循环，我已经展开了它，所以我们有例如

a[0] = b[0] + 1;
a[1] = b[1] + 1;
a[2] = b[2] + 1;

而不是

for(int i=0;i<3;++i) a[i] = b[i] + 1;

编译器仍然可以向量化这段代码吗？

此外，如果我使用引用访问元素，编译器是否有希望认识到两者是等效的？例如

int &x, &y, &z;
x = a[0]; y = a[1]; z = a[2];

，然后用 x、y 和 z 替换 a。

任何答案都非常感谢！提前致谢。

原文

I'm using intel-cc to compile some C++ code and with the -Wall option it seems to be vectorizing a lot of my loops. I'm working under the assumption this is good for performance for now.

Now my question is this; if instead of a for loop I have unrolled it so we have for example

a[0] = b[0] + 1;
a[1] = b[1] + 1;
a[2] = b[2] + 1;

instead of

for(int i=0;i<3;++i) a[i] = b[i] + 1;

can the compiler still vectorize this code?

Further, if I access the elements using instead references does the compiler have any hope of recognising that the two are equivalent? E.g.

int &x, &y, &z;
x = a[0]; y = a[1]; z = a[2];

Then replacing the a's with x, y and z.

Any answers greatly appreciated! Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

扎心 2024-11-25 07:16:50

因此我深入研究了这三个简单案例生成的程序集。以下;

for(int i=0;i<3;++i) a[i] = 1.0; // case 1
a[0] = a[1] = a[2] = 1.0;        // case 2 
a.x = a.y = a.z = 1.0;           // case 3

案例 2 和案例 3 生成的程序集是相同的。这很好，因为在情况 2 中，编译器给出了关于复制对临时引用的“备注”（operator[] 被我的类覆盖），这意味着（如果我错了，请纠正我）编译器正在正确利用返回值优化（ RVO）。

然而，在情况 1 中，编译器输出一条注释，表示它已对循环进行矢量化。装配也略有不同。具体来说，它包含这个额外的代码；

       .section .rodata, "a"
       .align 16
       .align 16
 _2il0floatpacket.1:
       .long   0x00000000,0x3ff00000,0x00000000,0x3ff00000
       .type   _2il0floatpacket.1,@object
       .size   _2il0floatpacket.1,16
 _2il0floatpacket.2:
       .long   0x00000000,0x3ff00000
       .type   _2il0floatpacket.2,@object
       .size   _2il0floatpacket.2,8

现在我从未使用过汇编，所以我不完全确定这些额外的东西意味着什么，但在我看来，这意味着编译器在展开循环或通过引用访问的情况下无法矢量化。编译时缺乏对此效果的注释也暗示了这一点。

如果有人能证实这一点那就太好了。

So I had a delve into the assembly generated by the three simple cases. below;

for(int i=0;i<3;++i) a[i] = 1.0; // case 1
a[0] = a[1] = a[2] = 1.0;        // case 2 
a.x = a.y = a.z = 1.0;           // case 3

The assembly generated for cases 2 and 3 was identical. This is good since in case 2 the compiler gave a "remark" about copying reference to temporary (operator[] is overridden for my class) this implies (correct me if I'm wrong) that the compiler is correctly utilizing Return Value Optimisation (RVO).

However in case 1 the compiler outputted a remark that it had vectorised the loop. The assembly was also slightly different. Specifically it contained this extra code;

       .section .rodata, "a"
       .align 16
       .align 16
 _2il0floatpacket.1:
       .long   0x00000000,0x3ff00000,0x00000000,0x3ff00000
       .type   _2il0floatpacket.1,@object
       .size   _2il0floatpacket.1,16
 _2il0floatpacket.2:
       .long   0x00000000,0x3ff00000
       .type   _2il0floatpacket.2,@object
       .size   _2il0floatpacket.2,8

Now I have never worked with assembly so I am not entirely sure what this extra stuff means but it would seem to me to imply that the compiler cannot vectorize in the case of the unrolled loop or accessing through references. Also hinted at by the lack of a remark to this effect at compile time.

If anyone could confirm this it would be great.

回复收藏 0 原文

~没有更多了~