Auto vectorization never worked out well for me. To me it seems like auto-vectorization only works for very trivial loops at the moment.
I use the pragma/intrinsic approach and take a look at the assembly. If the compiler generates bad code (like spilling SSE registes onto the stack or adding redundant moves) I use inline assembler for the whole loop body.
Portability is btw not a problem. Often you start with a C/C++ loop and optimize it using intrinsics. Just keep the old loop and use it as a unit-test / fallback for your SIMD implementation. Also it's always wise to be able to remove all SIMD code from a project via a compile-time define. Debugging an application is much easier that way. The same define can be used for cross-compilation.
有一两次,当矢量化确实很重要时,我们会在测试套件中添加一些内容来调用 objdump 并验证矢量指令是否确实正在使用。 如果能够自动检测“坏矢量代码”(如 Nils 所描述的),那就太好了,但我们还没有做到这一点。
I would never rely on automatic vectorization from any compiler. With gcc I would be doubly wary because the effects of gcc's optimizations always vary from version to version. Almost everyone I know who relies on special optimizations or gcc extensions has to deal with breakage when a new gcc version is released.
You can usually trust pragmas and intrinsics, but you should keep a sharp eye on release notes for new gcc versions, and you should tell your own users what gcc version is needed to compile your code.
Once or twice when vectorization really mattered, we've added something to the test suite to call objdump and verify that vector instructions are actually being used. It would be nice to be able to detect 'bad vector code' (as Nils describes) automatically as well, but we've never gotten that far.
发布评论
评论(2)
自动矢量化对我来说从来没有效果很好。 对我来说,自动矢量化目前似乎只适用于非常琐碎的循环。
我使用 pragma/intrinsic 方法并查看程序集。 如果编译器生成错误代码(例如将 SSE 寄存器溢出到堆栈上或添加冗余移动),我会对整个循环体使用内联汇编器。
顺便说一句,便携性不是问题。 通常,您从 C/C++ 循环开始,并使用内在函数对其进行优化。 只需保留旧循环并将其用作 SIMD 实现的单元测试/后备即可。 此外,能够通过编译时定义从项目中删除所有 SIMD 代码始终是明智的。 这样调试应用程序就容易多了。 相同的定义可用于交叉编译。
Auto vectorization never worked out well for me. To me it seems like auto-vectorization only works for very trivial loops at the moment.
I use the pragma/intrinsic approach and take a look at the assembly. If the compiler generates bad code (like spilling SSE registes onto the stack or adding redundant moves) I use inline assembler for the whole loop body.
Portability is btw not a problem. Often you start with a C/C++ loop and optimize it using intrinsics. Just keep the old loop and use it as a unit-test / fallback for your SIMD implementation. Also it's always wise to be able to remove all SIMD code from a project via a compile-time define. Debugging an application is much easier that way. The same define can be used for cross-compilation.
我永远不会依赖任何编译器的自动矢量化。 对于 gcc,我会加倍警惕,因为 gcc 优化的效果总是因版本而异。 我认识的几乎所有依赖特殊优化或 gcc 扩展的人都必须在新的
gcc
版本发布时处理损坏问题。您通常可以信任编译指示和内在函数,但您应该密切关注新 gcc 版本的发行说明,并且应该告诉您自己的用户编译代码需要什么 gcc 版本。
有一两次,当矢量化确实很重要时,我们会在测试套件中添加一些内容来调用 objdump 并验证矢量指令是否确实正在使用。 如果能够自动检测“坏矢量代码”(如 Nils 所描述的),那就太好了,但我们还没有做到这一点。
I would never rely on automatic vectorization from any compiler. With
gcc
I would be doubly wary because the effects ofgcc
's optimizations always vary from version to version. Almost everyone I know who relies on special optimizations or gcc extensions has to deal with breakage when a newgcc
version is released.You can usually trust pragmas and intrinsics, but you should keep a sharp eye on release notes for new gcc versions, and you should tell your own users what gcc version is needed to compile your code.
Once or twice when vectorization really mattered, we've added something to the test suite to call
objdump
and verify that vector instructions are actually being used. It would be nice to be able to detect 'bad vector code' (as Nils describes) automatically as well, but we've never gotten that far.