加速稀疏 FFT 计算
我希望有人可以查看下面的代码并提供如何加快 tic 和 toc 之间部分的提示。下面的函数尝试比 Matlab 的内置函数更快地执行 IFFT,因为 (1) 几乎所有的 fft 系数 bin 都为零(即 10
到 1000
bin) 10M
到 300M
bin 的数量非零),并且 (2) 仅保留中间三分之一的 IFFT 结果(前三分之一和后三分之一被丢弃 - 所以不需要首先计算它们)。
输入变量为:
fftcoef = complex fft-coef 1D array (10 to 1000 pts long)
bins = index of fft coefficients corresponding to fftcoef (10 to 1000 pts long)
DATAn = # of pts in data before zero padding and fft (in range of 10M to 260M)
FFTn = DATAn + # of pts used to zero pad before taking fft (in range of 16M to 268M) (e.g. FFTn = 2^nextpow2(DATAn))
目前,此代码比 Matlab 的 ifft
函数方法长几个数量级,后者计算整个频谱,然后丢弃其中的 2/3
。例如,如果 fftcoef 和 bins 的输入数据是 9x1
数组(即每个边带只有 9
个复数 fft 系数;考虑两者时为 18
点边带),以及 DATAn=32781534
、FFTn=33554432
(即 2^25
),则 ifft 方法需要 1.6
code> 秒,而下面的循环需要 700
秒。
我避免使用矩阵来矢量化 nn 循环,因为有时 fftcoef 和 bins 的数组大小可能长达 1000 点,而 260Mx1K 矩阵也太长对于内存来说很大,除非它可以以某种方式分解。
非常感谢任何建议!提前致谢。
function fn_fft_v1p0(fftcoef, bins, DATAn, FFTn)
fftcoef = [fftcoef; (conj(flipud(fftcoef)))]; % fft coefficients
bins = [bins; (FFTn - flipud(bins) +2)]; % corresponding fft indices for fftcoef array
ttrend = zeros( (round(2*DATAn/3) - round(DATAn/3) + 1), 1); % preallocate
start = round(DATAn/3)-1;
tic;
for nn = start+1 : round(2*DATAn/3) % loop over desired time indices
% sum over all fft indices having non-zero coefficients
arg = 2*pi*(bins-1)*(nn-1)/FFTn;
ttrend(nn-start) = sum( fftcoef.*( cos(arg) + 1j*sin(arg));
end
toc;
end
I'm hoping someone can review my code below and offer hints how to speed up the section between tic and toc. The function below attempts to perform an IFFT faster than Matlab's built-in function since (1) almost all of the fft-coefficient bins are zero (i.e. 10
to 1000
bins out of 10M
to 300M
bins are non-zero), and (2) only the central third of the IFFT results are retained (the first and last third are discarded -- so no need to compute them in the first place).
The input variables are:
fftcoef = complex fft-coef 1D array (10 to 1000 pts long)
bins = index of fft coefficients corresponding to fftcoef (10 to 1000 pts long)
DATAn = # of pts in data before zero padding and fft (in range of 10M to 260M)
FFTn = DATAn + # of pts used to zero pad before taking fft (in range of 16M to 268M) (e.g. FFTn = 2^nextpow2(DATAn))
Currently, this code takes a few orders of magnitude longer than Matlab's ifft
function approach which computes the entire spectrum then discards 2/3
's of it. For example, if the input data for fftcoef and bins are 9x1
arrays (i.e. only 9
complex fft coefficients per sideband; 18
pts when considering both sidebands), and DATAn=32781534
, FFTn=33554432
(i.e. 2^25
), then the ifft approach takes 1.6
seconds whereas the loop below takes over 700
seconds.
I've avoided using a matrix to vectorize the nn loop since sometimes the array size for fftcoef and bins could be up to 1000
pts long, and a 260Mx1K
matrix would be too large for memory unless it could be broken up somehow.
Any advice is much appreciated! Thanks in advance.
function fn_fft_v1p0(fftcoef, bins, DATAn, FFTn)
fftcoef = [fftcoef; (conj(flipud(fftcoef)))]; % fft coefficients
bins = [bins; (FFTn - flipud(bins) +2)]; % corresponding fft indices for fftcoef array
ttrend = zeros( (round(2*DATAn/3) - round(DATAn/3) + 1), 1); % preallocate
start = round(DATAn/3)-1;
tic;
for nn = start+1 : round(2*DATAn/3) % loop over desired time indices
% sum over all fft indices having non-zero coefficients
arg = 2*pi*(bins-1)*(nn-1)/FFTn;
ttrend(nn-start) = sum( fftcoef.*( cos(arg) + 1j*sin(arg));
end
toc;
end
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您必须记住,Matlab 使用编译的 fft 库(http://www.fftw.org/ )因其 fft 函数,除了运行速度比 Matlab 脚本快得多之外,它还针对许多用例进行了很好的优化。因此,第一步可能是用 c/c++ 编写代码并将其编译为可以在 Matlab 中使用的 mex 文件。这肯定会加速你的代码至少一个数量级(可能更多)。
除此之外,您可以做的一个简单优化是考虑两件事:
这两点被转换为以下循环:
请注意,您必须在展开
bins
和fftcoef
之前使用此循环,因为对称性已经存在考虑到。使用您的问题中的参数运行此循环需要 8.3 秒,而在我的电脑上使用您的代码运行需要 141.3 秒。You have to keep in mind that Matlab uses a compiled fft library (http://www.fftw.org/) for its fft functions, which besides operating much faster then a Matlab script, it is well optimized for many use-cases. So a first step might be writing your code in c/c++ and compiling it as a mex file you can use within Matlab. That will surely speed up your code at least an order of magnitude (probably more).
Besides that, one simple optimization you can do is by considering 2 things:
These two points are translated to the following loop:
Note you have to use this loop before you expand
bins
andfftcoef
, since the symmetry is already taken into account. This loop takes 8.3 seconds to run with the parameters from your question, while it takes on my pc 141.3 seconds to run with your code.我在 Accelerating FFTW pruning to避免大量零填充中发布了一个问题/答案 使用 FFTW 解决了 C++ 情况下的问题。您可以通过利用
mex
-files 来使用此解决方案。I have posted a question/answer at Accelerating FFTW pruning to avoid massive zero padding which solves the problem for the C++ case using FFTW. You can use this solution by exploiting
mex
-files.