快速傅里叶变换短期窗口:混乱
我使用 0.025s 长的汉明窗和 32768 点 FFT(?) 计算了 44100Hz 采样音频信号的频谱图,这是我的困惑:
- 44100*0.025 ~= 1103 个样本,这是 << N=32768,
- 但我的经验是这个高N参数显着提高了频谱图的分辨率。
所以我的问题是发生了什么事?
从这个很棒的解释中,我得出的结论是,32768点FFT通常意味着它以 1 秒为间隔,并且实际上 Voicebox 的 rfft 功能(我的使用)提到它将样本截断/填充为 N。所以我假设它用 0 将我的小 1103 向量填充为 32768 长向量,以便能够计算 FFT。
嗯,这真的是发生的事情吗?尽管只有信号的前 1/32 是非零,但这是否可以提高分辨率? (嗯,我想是的,但我想确定一下,因为这是在论文答辩中出现的——我现在刚刚有了这个想法,写了这篇文章)。
感谢您的任何反馈。
I've computed the spectogram of an 44100Hz sampled audio signal using 0.025s long Hamming windows with 32768 point FFT(?) and here is my confusion:
- 44100*0.025 ~= 1103 sample which is << N=32768,
- yet my experience was that this high N parameter had significantly improved the resolution of the spectogram.
So my question would be what's going on??
From this awesome explanation I would conclude that the 32768 point FFT usually means that it's meant on 1 sec interval, and indeed the Voicebox's rfft function(what I used) mentions that it truncates/pads the sample to N. So I assume it padded my small 1103 vector with 0s to a 32768 long vector, to be able to compute the FFT.
Umm, is this what really happens? Can this improve the resolution although just only the first 1/32th of the signal is non-zero? (Well I think yes, but I want to be sure as this came up on thesis-defense - and I've just got this idea now, writing this post).
Thanks for any feedback.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
时域中的补零相当于频域中的插值(反之亦然)。因此,您提高了分辨率,这使您可以在点之间绘制更平滑的曲线。但你并没有增加信息内容;您对插值 FFT 输出进行的任何处理都可以在非插值 FFT 输出上进行。
Zero-padding in the time-domain is equivalent to interpolation in the frequency domain (and vice versa). So you've improved the resolution in the sense that this allows you to draw a smoother curve in between the points. But you haven't increased the information content; any processing that you do on the interpolated FFT output will be possible on the non-interpolated FFT output.
正如 Oli 指出的,FFT 补零是一种插值方法。更具体地说,插值内核是您使用的窗口的变换。因此,在某种程度上,“分辨率”的改进与所选窗口的形状和宽度更相关,而不是与数据的光谱内容相关。
As Oli pointed out, zero-padding an FFT is a method of interpolation. More specifically, the interpolation kernel is the transform of the window you used. So, at some point, your improvement in "resolution" is more related to the shape and width of your chosen window than to the spectral content of your data.