I've compared many NEON optimized FFT libraries on ARM Cortex-A9, and "libav" is certainly the fastest FFT code, but it is: - single-threaded, - only supports 1D FFTs, - only supports power-of-2 dimensions, - and doesn't have various optimizations for real input/output (it is only a complex-to-complex FFT).
On the other hand, "FFTW" (either the official version or the Vesperix version) is multi-threaded, supports 2D FFTs, supports non-power-of-2 dimensions with very little penalty, and has full optimizations for real input/output instead of just complex input/output.
So depending on your FFT requirements, FFTW might be faster for your project due to the extra features, but if you only need the FFT that libav provides (or you write the extra features yourself using NEON and multi-threading), then libav is actually the fastest 1D Complex-to-Complex FFT code.
To give you an indication, it seems that the FFTW NEON optimizations were performed by a student of the guy who performed the libav NEON optimizations. So would you rather the code from the student or the mentor ;-)
Another issue is that libav uses an LGPL license whereas FFTW uses a GPL license so is more restrictive, unless if you are willing to pay a large sum of money to purchase a proper license for FFTW.
(Personally, I ended up writing my own 2D & real-data features using NEON & multi-threading on top of libav's 1D FFT, but it was a lot of effort since I wasn't an FFT expert!)
发布评论
评论(3)
以下是在 ARM 上对不同 fft 算法进行基准测试的页面:
http://pmeerw.dyndns.org/ blog/programming/neon3.html
从该页面来看,最快的 FFT 实现是 LibAv,它具有 Neon 优化的 fft http://libav.org/
Here is a page benchmarking different fft algorithms on ARM:
http://pmeerw.dyndns.org/blog/programming/neon3.html
From that page the fastest FFT implementation is LibAv, which have a Neon optimized fft http://libav.org/
我在 ARM Cortex-A9 上比较了许多 NEON 优化的 FFT 库,“libav” 无疑是最快的 FFT 代码,但它是:
- 单线程,
- 仅支持 1D FFT,
- 仅支持 2 次方维度,
- 并且没有针对实际输入/输出的各种优化(它只是复杂到复杂的 FFT)。
另一方面,“FFTW”(无论是官方版本还是Vesperix版本)是多线程的,支持2D FFT,支持非2次方维度且损失很小,并且对实际输入/输出进行了全面优化而不仅仅是复杂的输入/输出。
因此,根据您的 FFT 要求,由于额外的功能,FFTW 可能对您的项目更快,但如果您只需要 libav 提供的 FFT(或者您使用 NEON 和多线程自己编写额外的功能),那么 libav 实际上是最快的一维复数到复数 FFT 代码。
为了给你一个指示,FFTW NEON 优化似乎是由执行 libav NEON 优化的人的学生执行的。所以你更喜欢学生的代码还是导师的代码;-)
另一个问题是 libav 使用 LGPL 许可证,而 FFTW 使用 GPL 许可证,因此限制性更大,除非你愿意花一大笔钱购买FFTW 的适当许可。
(就我个人而言,我最终在 libav 的 1D FFT 之上使用 NEON 和多线程编写了自己的 2D 和真实数据功能,但由于我不是 FFT 专家,因此需要付出很大的努力!)
I've compared many NEON optimized FFT libraries on ARM Cortex-A9, and "libav" is certainly the fastest FFT code, but it is:
- single-threaded,
- only supports 1D FFTs,
- only supports power-of-2 dimensions,
- and doesn't have various optimizations for real input/output (it is only a complex-to-complex FFT).
On the other hand, "FFTW" (either the official version or the Vesperix version) is multi-threaded, supports 2D FFTs, supports non-power-of-2 dimensions with very little penalty, and has full optimizations for real input/output instead of just complex input/output.
So depending on your FFT requirements, FFTW might be faster for your project due to the extra features, but if you only need the FFT that libav provides (or you write the extra features yourself using NEON and multi-threading), then libav is actually the fastest 1D Complex-to-Complex FFT code.
To give you an indication, it seems that the FFTW NEON optimizations were performed by a student of the guy who performed the libav NEON optimizations. So would you rather the code from the student or the mentor ;-)
Another issue is that libav uses an LGPL license whereas FFTW uses a GPL license so is more restrictive, unless if you are willing to pay a large sum of money to purchase a proper license for FFTW.
(Personally, I ended up writing my own 2D & real-data features using NEON & multi-threading on top of libav's 1D FFT, but it was a lot of effort since I wasn't an FFT expert!)
还可以尝试Cricket FFT。它还具有 Neon 优化,并且具有非常宽松的许可证 - zlib。
Try also Cricket FFT. It also have Neon optimizations, and has very permissive license - zlib.