使用 opencv 框架的 Objective C 项目的最佳编译器标志
我正在使用 opencv 框架编译 ios 项目,所以我有兴趣知道什么是我的项目的最佳编译器标志。
该项目处理大量矩阵像素,因此我需要从编译器方面获得 SIMD 指令,以便能够尽可能高效地处理该矩阵。
我使用这个标志:-mfpu=neon、-mfloat-abi=softfp 和 -O3,
我还找到了其他标志: -mno-拇指 -mfpu=特立独行 -ftree-向量化 -DNS_BLOCK_ASSERTIONS=1
我真的不知道它是否会节省我大量的CPU处理时间,我通过谷歌进行搜索,但我没有找到一些东西可以让我有充分的理由知道最好的编译器标志。
谢谢
I´m compiling and ios project using an opencv framework, so I´m interested to know what are the best compiler flags to my project.
The project process a lot of matrix pixels , so I need from the side of the compiler to have SIMD instructions to be able to process this matrix as efficient as possible.
I using this flags :-mfpu=neon, -mfloat-abi=softfp and -O3,
And I also find this other flags:
-mno-thumb
-mfpu=maverick
-ftree-vectorize
-DNS_BLOCK_ASSERTIONS=1
I don´t know really if it is going to save me a lot of cpu processing, I search through google, but I didn´t find something that give me good reasons to know the best compiler flags.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我还使用与霓虹灯相同的标志。根据优化级别 O3 或其他任何级别,不会对 neon 内部代码进行任何优化。它只是优化了 ARM 代码。
正如 Vasile 所说,通过用汇编语言编写 neon 代码可以获得最佳性能。
最简单的方法是编写一个使用内在霓虹灯代码的程序,并使用您提到的标志对其进行编译。现在使用为代码生成的汇编代码进行进一步优化。
通过并行化或利用 neon 的双指令功能可以完成大量优化。
I am also using the same flags that you use for neon. No optimization would be done on neon intrinsic codes according to the optimization level O3 or anything. It just optimizes the ARM code.
As said by Vasile the best performance can be gained by writing the neon codes in assembly.
The easiest way is to write a program in which intrinsic neon codes are used and compile it using the flags you mentioned. Now use the assembly code generated for the code for further optimization.
A lot of optimization can be done by parallelizing or making use of dual instruction capabilities of neon.
问题在于编译器不太擅长生成矢量化代码。因此,仅通过启用 NEON,您不会获得太多改进(也许 10%??),
您可以做的是分析您的应用程序并使用 NEON 手动编写那些占用您时间的部分。如果您这样做,为什么不将它们修补到公共 OpenCV 源代码中呢?
到目前为止,OpenCV 几乎没有针对 NEON 进行优化的代码(对于 x86 SSE2,它的优化要好得多)。
The problem is that compilers are not so good at generating vectorized code. So, by just enabling NEON you'll not get much improvements (maybe 10% ??)
what you can do is to profile your app and write by hand those parts that eats your time, using NEON. And if you do it, why not patch them into the public OpenCV source?
By now, OpenCV has little to no code optimized for NEON (for the x86 SSE2, it is much better optimized).