CUDA FFT - 2 的幂
我正在查看 CUDA SDK 上的 FFT 示例,我想知道:为什么当填充数据的一半是 2 的幂时,CUFFT 会快得多? (一半是因为在频域中一半是多余的)
使用两倍大小的幂有什么意义?
I'm looking at the FFT example on the CUDA SDK and I'm wondering: why the CUFFT is much faster when the half of the padded data is a power of two? (half because in frequency domain half is redundant)
What's the point in having a power of two size to work on?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我想这就是你的答案。它使用不同的算法
http://forums.nvidia.com/index.php?showtopic=195094
来自手册:http://developer. download.nvidia.com/compute/cuda/3_1/toolkit/docs/CUFFT_Library_3.1.pdf
I think this is your answer. It's using different algorithms
http://forums.nvidia.com/index.php?showtopic=195094
From the manual: http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/CUFFT_Library_3.1.pdf
只是为 Ade 的答案添加更多背景知识:
一般来说,离散傅立叶变换需要大量计算。 N 点的一维 FFT 需要 N*N 次乘法。 FFT(快速傅立叶变换)速度更快,只是因为在 N 是 2 的幂的情况下,可以重写方程,这样您只需要 N * log2 N 次乘法。
在大多数应用中,您并不关心样本的确切数量。因此,您选择二的幂,以获得最佳性能。
三或五的幂也可以,但二的幂是最快的,也是最容易编写的算法,因此多年来它已成为主导。
Just to add a little more background to Ade's answer:
In general, a discrete Fourier transform is a lot of computation. A single dimenision FFT of N points takes N*N multiplications. FFT (Fast Fourier Transforms) are faster only because in case N is a power of 2, the equations can be rewritten such that you need only N * log2 N multiplications.
In most applications, you don't care about the exact number of samples. So you choose powers of two, to get the best performance.
Powers of three, or five would also work, but powers of two are the fastest, and is the easiest algorithm to write, so that has become dominant over the years.