FFT - 何时加窗?
我在这里看到了各种 FFT 问题,但我对部分实现感到困惑。我不想实时执行 FFT,而是想离线执行。假设我有 float[] audio
中的原始数据。采样率为 44100,因此 audio[0] 到 audio[44099]
将包含 1 秒的音频。如果我的 FFT 函数处理窗口(例如 Hanning),我是否只需将整个音频缓冲区一次性放入函数中?或者,我是否必须将音频切成 4096 块(我的窗口大小),然后将其输入到 FFT,然后 FFT 将在顶部执行窗口功能?
I've seen the various FFT questions on here but I'm confused on part of the implementation. Instead of performing the FFT in real time, I want to do it offline. Lets say I have the raw data in float[] audio
. The sampling rate is 44100 and so audio[0] to audio[44099]
will contain 1 seconds worth of audio. If my FFT function handles the windowing (e.g. Hanning), do I simply put the entire audio
buffer into the function in one go? Or, do I have to cut the audio into chunks of 4096 (my window size) and then input that into the FFT which will then perform the windowing function on top?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可能需要将输入数据复制到单独的缓冲区并以正确的格式获取它,例如,如果您的 FFT 就地,或者如果它需要交错的复杂数据(实数/虚数)。但是,如果您的 FFT 例程可以采用纯真实输入并且不是就地(即非破坏性),那么您可能只需传递指向原始样本数据的指针以及适当的大小参数即可。
通常,对于 1 秒的音频(例如语音或音乐),您会选择与相当稳定的音频块相对应的 FFT 大小,例如 10 毫秒或 20 毫秒。因此,在 44.1 kHz 时,您的 FFT 大小可能是 512 或 1024。然后,您可以通过在缓冲区中前进并在每个起点执行新的 FFT 来生成连续的频谱。请注意,通常的做法是重叠这些连续的缓冲区,通常重叠 50%。因此,如果 N = 1024,您的第一个 FFT 将针对样本 0..1023,第二个 FFT 将针对样本 512..1535,然后是 1024..2047,依此类推。
You may need to copy your input data to a separate buffer and get it in the correct format, e.g. if your FFT is in-place, or if it requires interleaved complex data (real/imaginary). However if your FFT routine can take a purely real input and is not in-place (i.e. non-destructive) then you may just be able to pass a pointer to the original sample data, along with an appropriate size parameter.
Typically for 1s of audio, e.g. speech or music, you would pick an FFT size which corresponds to a reasonably stationary chunk of audio, e.g. 10 ms or 20 ms. So at 44.1 kHz your FFT size might be say 512 or 1024. You would then generate successive spectra by advancing through your buffer and doing a new FFT at each starting point. Note that it's common practice to overlap these successive buffers, typically by 50%. So if N = 1024 your first FFT would be for samples 0..1023, your second would be for samples 512..1535, then 1024..2047, etc.
选择是否对整个数据集计算一次 FFT(在 OP 的情况下,44100 个样本代表 1 秒的数据),或者是否对整个数据集的较小子集进行一系列 FFT,取决于数据,以及 FFT 的预期目的。
如果整个数据集上的数据在频谱上相对静态,那么可能只需要对整个数据集进行一次 FFT。
然而,如果数据在数据集上是频谱动态的,则对数据的小子集进行多次滑动 FFT 将创建更准确的数据时频表示。
下图显示了弹奏 A4 音符的原声吉他的功率谱。音频信号以 44.1 KHz 采样,数据集包含 131072 个样本,大约 3 秒的数据。该数据集预先乘以 Hann 窗函数。
下图显示了 16384 个样本子集(0 到 16383)的功率谱取自原声吉他 A4 音符的完整数据集。该子集还预先乘以 Hann 窗函数。
注意子集的频谱能量分布与完整的数据集。
如果我们使用滑动 16384 个样本帧从完整数据集中提取子集,并计算每个帧的功率谱,我们将创建完整数据集的准确时频图。
参考资料:
真实音频信号数据、Hann 窗函数、绘图、FFT 和频谱分析均在此处完成:
快速傅立叶变换、频谱分析、汉恩窗函数、音频数据
The choice of whether to calculate one FFT over the entire data set (in the OP's case, 44100 samples representing 1-second of data), or whether to do a series of FFT's over smaller subsets of the full data set, depends on the data, and on the intended purpose of the FFT.
If the data is relatively static spectrally over the full data set, then one FFT over the entire data set is probably all that's needed.
However, if the data is spectrally dynamic over the data set, then multiple sliding FFT's over small subsets of the data would create a more accurate time-frequency representation of the data.
The plot below shows the power spectrum of an acoustic guitar playing an A4 note. The audio signal was sampled at 44.1 KHz and the data set contains 131072 samples, almost 3 seconds of data. This data set was pre-multiplied with a Hann window function.
The plot below shows the power spectrum of a subset of 16384 samples (0 to 16383) taken from the full data set of the acoustic guitar A4 note. This subset was also pre-multiplied with a Hann window function.
Notice how the spectral energy distribution of the subset is significantly different from the spectral energy distribution of the full data set.
If we were to extract subsets from the full data set, using a sliding 16384 sample frame, and calculate the power spectrum of each frame, we would create an accurate time-frequency picture of the full data set.
References:
Real audio signal data, Hann window function, plots, FFT, and spectral analysis were done here:
Fast Fourier Transform, spectral analysis, Hann window function, audio data
您选择的块大小或窗口长度控制 FFT 结果的频率分辨率和时间分辨率。您必须确定您想要哪个或要做出什么权衡。
较长的窗口可以提供更好的频率分辨率,但时间分辨率较差。较短的窗口,反之亦然。每个 FFT 结果箱将包含大约 1 到 2 倍采样率除以 FFT 长度的频率带宽,具体取决于窗口形状(矩形、von Hann 等),而不仅仅是一个频率。如果您的整个数据块是固定的(频率内容不会改变),那么您可能不需要任何时间分辨率,并且可以在 1 秒数据中采用 1 到 2 Hz 频率“分辨率”。对多个短 FFT 窗口进行平均也可能有助于减少频谱估计的方差。
The chunk size or window length you pick controls the frequency resolution and the time resolution of the FFT result. You have to determine which you want or what trade-off to make.
Longer windows give you better frequency resolution, but worse time resolution. Shorter windows, vice versa. Each FFT result bin will contain a frequency bandwidth of roughly 1 to 2 times the sample rate divided by the FFT length, depending on the window shape (rectangular, von Hann, etc.), not just one single frequency. If your entire data chunk is stationary (frequency content doesn't change), then you may not need any time resolution, and can go for 1 to 2 Hz frequency "resolution" in your 1 second of data. Averaging multiple short FFT windows might also help reduce the variance of your spectral estimations.