在构建音频信号的功率谱时,为什么需要对样本应用窗函数?
我已经多次发现 获取音频信号功率谱的以下准则:
- 收集 N 个样本,其中 N 是 2 的幂,
- 应用合适的窗函数到样本,例如 Hanning
- 将加窗样本传递到 FFT 例程 - 理想情况下,您需要实数到复数 FFT,但如果您拥有的是复数到复数 FFT,则为所有虚数输入部分传递 0
- 计算平方FFT 输出箱的幅度 (re * re + im * im)
- (可选)计算每个幅度平方输出箱的 10 * log10 以获得以 dB 为单位的幅度值
- 现在您已经有了功率谱,您只需要识别峰值,如果您有合理的信噪比,这应该非常简单。请注意,N 越大,频率分辨率越高。对于上述 44.1 kHz 采样率和 N = 32768 的示例,每个 bin 的频率分辨率为 44100 / 32768 = 1.35 Hz。
但是......为什么我需要对样本应用窗口函数?这到底意味着什么?
那么功率谱呢,是采样率范围内各个频率的功率吗? (例如:Windows Media Player 声音可视化工具?)
I have found for several times the following guidelines for getting the power spectrum of an audio signal:
- collect N samples, where N is a power of 2
- apply a suitable window function to the samples, e.g. Hanning
- pass the windowed samples to an FFT routine - ideally you want a real-to-complex FFT but if all you have a is complex-to-complex FFT then pass 0 for all the imaginary input parts
- calculate the squared magnitude of your FFT output bins (re * re + im * im)
- (optional) calculate 10 * log10 of each magnitude squared output bin to get a magnitude value in dB
- Now that you have your power spectrum you just need to identify the peak(s), which should be pretty straightforward if you have a reasonable S/N ratio. Note that frequency resolution improves with larger N. For the above example of 44.1 kHz sample rate and N = 32768 the frequency resolution of each bin is 44100 / 32768 = 1.35 Hz.
But... why do I need to apply a window function to the samples? What does that really means?
What about the power spectrum, is it the power of each frequency in the range of sample rate? (example: windows media player visualizer of sound?)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
大多数现实世界的音频信号都是非周期性的,这意味着真实的音频信号通常不会在任何给定的时间跨度内精确重复。
然而,傅里叶变换的数学假设被傅里叶变换的信号在所讨论的时间跨度内是周期性的。
傅里叶周期性假设与音频信号通常是非周期性的现实世界事实之间的不匹配会导致变换错误。
这些误差称为“频谱泄漏”,通常表现为信号功率谱上能量的错误分布。
下图显示了弹奏 A4 音符的原声吉他的功率谱特写。频谱是通过 FFT(快速傅立叶变换)计算的,但信号在 FFT 之前没有加窗。
请注意 -60 dB 线以上的能量分布,以及大约 440 Hz、880 Hz 和 1320 Hz 处的三个不同峰值。这种特殊的能量分布包含“频谱泄漏”误差。
为了在一定程度上减轻“频谱泄漏”错误,您可以预先-将信号乘以专门为此目的设计的窗函数,例如Hann窗函数。
下图显示了时域中的汉恩窗函数。请注意函数的尾部如何平滑地趋向于 0,而函数的中心部分如何平滑地趋于值 1。
现在让我们将 Hann 窗应用到吉他的音频数据,然后对结果信号进行 FFT。
下图显示了同一信号(弹奏 A4 音符的原声吉他)的功率谱特写,但这次信号在 FFT 之前预先乘以 Hann 窗函数。
请注意 -60 dB 线以上的能量分布如何发生显着变化,以及三个不同峰值的形状和高度如何发生变化。这种特殊的光谱能量分布包含较少的“光谱泄漏”误差。
用于此分析的原声吉他 A4 音符采样率为 44.1 KHz 在录音室条件下使用高品质麦克风,基本上包含零背景噪音,没有其他乐器或声音,也无需后期处理。
参考资料:
真实音频信号数据、Hann 窗函数、绘图、FFT 和频谱分析均在此处完成:
Fast傅里叶变换、频谱分析、汉恩窗函数、音频数据
Most real world audio signals are non-periodic, meaning that real audio signals do not generally repeat exactly, over any given time span.
However, the math of the Fourier transform assumes that the signal being Fourier transformed is periodic over the time span in question.
This mismatch between the Fourier assumption of periodicity, and the real world fact that audio signals are generally non-periodic, leads to errors in the transform.
These errors are called "spectral leakage", and generally manifest as a wrongful distribution of energy across the power spectrum of the signal.
The plot below shows a closeup of the power spectrum of an acoustic guitar playing the A4 note. The spectrum was calculated with the FFT (Fast Fourier Transform), but the signal was not windowed prior to the FFT.
Notice the distribution of energy above the -60 dB line, and the three distinct peaks at roughly 440 Hz, 880 Hz, and 1320 Hz. This particular distribution of energy contains "spectral leakage" errors.
To somewhat mitigate the "spectral leakage" errors, you can pre-multiply the signal by a window function designed specifically for that purpose, like for example the Hann window function.
The plot below shows the Hann window function in the time-domain. Notice how the tails of the function go smoothly to zero, while the center portion of the function tends smoothly towards the value 1.
Now let's apply the Hann window to the guitar's audio data, and then FFT the resulting signal.
The plot below shows a closeup of the power spectrum of the same signal (an acoustic guitar playing the A4 note), but this time the signal was pre-multiplied by the Hann window function prior to the FFT.
Notice how the distribution of energy above the -60 dB line has changed significantly, and how the three distinct peaks have changed shape and height. This particular distribution of spectral energy contains fewer "spectral leakage" errors.
The acoustic guitar's A4 note used for this analysis was sampled at 44.1 KHz with a high quality microphone under studio conditions, it contains essentially zero background noise, no other instruments or voices, and no post processing.
References:
Real audio signal data, Hann window function, plots, FFT, and spectral analysis were done here:
Fast Fourier Transform, spectral analysis, Hann window function, audio data
正如 @cyco130 所说,您的样本已经通过矩形函数加窗了。由于傅里叶变换假定周期性,因此最后一个样本和重复的第一个样本之间的任何不连续性都将导致频谱中出现伪影(例如峰值的“拖尾”)。这称为光谱泄漏。为了减少这种影响,我们应用了锥形窗函数,例如 Hann 窗,它可以平滑任何此类不连续性,从而减少频谱中的伪影。
As @cyco130 says, your samples are already windowed by a rectangular function. Since a Fourier Transform assumes periodicity, any discontinuity between the last sample and the repeated first sample will cause artefacts in the spectrum (e.g. "smearing" of the peaks). This is known as spectral leakage. To reduce the effect of this we apply a tapered window function such as a Hann window which smooths out any such discontinuity and thereby reduces artefacts in the spectrum.
请注意,非矩形窗口既有好处也有成本。时域中窗口的结果相当于窗口变换与信号频谱的卷积。典型的窗口,例如冯汉恩窗口,将减少任何非周期性频谱内容的“泄漏”,这将导致看起来噪声较小的频谱;但是,作为回报,卷积将“模糊”几个相邻箱中任何精确或接近周期性频谱峰值的内容。例如,所有频谱峰值将变得更圆,这可能会降低频率估计精度。如果您先验地知道不存在非周期性内容(例如,来自某些旋转同步采样系统的数据),则非矩形窗口实际上可能会使 FFT 看起来更糟。
非矩形窗口也是一个信息丢失的过程。假设计算精度有限,窗口边缘附近的大量光谱信息将被丢弃。因此,非矩形窗口最好与重叠窗口处理一起使用,和/或当人们可以假设感兴趣的频谱在整个窗口宽度上是静止的或在窗口的中心时。
Note that a non-rectangular window has both benefits and costs. The result of a window in the time-domain is equivalent to a convolution of the window's transform with the signal's spectrum. A typical window, such as a von Hann window, will reduce the "leakage" from any non-periodic spectral content, which will result in a less noisy looking spectrum; but, in return, the convolution will "blur" any exactly or close to periodic spectral peaks across a few adjacent bins. e.g. all the spectral peaks will become rounder looking which may reduce frequency estimation accuracy. If you know, apriori, that there is no non-periodic content (e.g. data from some rotationally synchronous sampling system), a non-rectangular window could actually make the FFT look worse.
A non-rectangular window is also an informationally lossy process. A significant amount of spectral information near the edges of the window will be thrown away, assuming finite precision arithmetic. So non-rectangular windows are best used with overlapping window processing, and/or when one can assume that the spectrum of interest is either stationary across the entire window width, or centered in the window.
如果您没有应用任何窗口函数,那么您实际上是在应用矩形窗口函数。不同的窗口函数有不同的特性,这取决于你到底想要什么。
If you're not applying any windowing function, you're actually aplying a rectangular windowing function. Different windowing functions have different characteristics, it depends on what you want exactly.