FFT 的窗口大小与数据长度
我正在尝试对流音频数据进行快速频谱分析以捕获元音(类似于 JLip-sync)。使用 PyAudio 在短时间内(0.0625 秒)捕获小块 (1024) 的语音数据。使用 numpy.fft 进行分析,并使用 numpy.hanning 窗口消除泄漏。我使用 4096*4 作为采样率(不是 44100 或 22050,也可以讨论;4096*4 最接近 22050)。
考虑到我感兴趣的频率(范围从 300 Hz 到 3000Hz),如何使用我正在寻找的数据长度和最小/最大频率来计算理想的窗口大小?
谢谢。
卡迪尔
I am trying to do a quick spectral analysis on the streaming audio data to capture vowels (something like JLip-sync). Using PyAudio to capture the voice data in small chunks (1024) for short durations (0.0625 sec.). Using numpy.fft for the analysis, and to get rid of leakage using numpy.hanning window. I am using 4096*4 as the sampling rate (not 44100 or 22050, and open to discussion as well; 4096*4 being nearest to 22050).
Considering the frequencies I am interested in (ranging from 300 Hz to 3000Hz) how can the ideal window size be calculated using data length and min/max frequencies I am looking for?
Thanks.
Kadir
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
@Kadir:
在使用离散傅里叶变换(DFT或FFT)处理数据之前对数据进行加窗的目的是最大限度地减少频谱泄漏,当您尝试对非循环数据进行傅里叶变换时会发生这种情况。
窗口化的工作原理是在序列的开始和结束时强制数据平滑为零,但不是之前。缩短窗口会不必要地破坏信息。
因此,您的窗口长度应该与样本序列的长度相匹配。例如,对于 1024 个样本,您的窗口长度应为 1024。
如果您要解析的最高频率是 3 KHz,请在不同的采样率下使用 8192 个样本或更多样本,例如 16384 或 32768 个样本。
另外,尝试不同的FFT算法、不同的样本长度和不同的窗口,包括Hann(汉宁),还有其他旁瓣衰减更好的窗口,例如Blackman-Harris系列和Kaiser-Bessel系列等。
如果您的应用存在噪声,您可能必须在更好的噪声抑制窗口和更高的光谱分辨率窗口之间进行选择。因此,尝试不同的窗口是个好主意,这样您就可以找到最适合您的应用程序的窗口。
现在,写下每个设置的结果(即每个窗口、样本长度、采样率等),并寻找在多个设置中一致的结果。您将了解更多有关您的数据的信息,并且很可能找到问题的答案。
您可以使用 Matlab 执行此操作: http://www.mathworks.com/help/ techdoc/ref/fft.html
或者使用此在线 FFT 频谱分析仪:http://www.sooeet.com/math/fft.php
并且不要忘记在这里发布您的结果。
@Kadir:
The purpose of windowing your data before processing it with a discrete Fourier transform (DFT or FFT), is to minimize spectral leakage, which happens when you try to Fourier-transform non-cyclical data.
Windowing works by forcing your data smoothly to zero at exactly the start and end of the sequence, but not before. Shortening your window destroys information unnecessarily.
So your window length should match the length of your sample sequences. For instance, with 1024 samples, your window length should be 1024.
If the highest frequency you want to resolve is 3 KHz, use 8192 samples or more, such as 16384, or 32768 samples, at various sampling rates.
Also, try a different FFT algorithm, different sample lengths, and different windows, including the Hann (Hanning), but also other windows with better side lobe attenuation, such as the Blackman-Harris series, and the Kaiser-Bessel series, etc.
If your application is noisy, you may have to choose between the better noise suppression windows, and the higher spectral resolution windows. So it's a good idea to try different windows, so you can find the best one for your application.
Now, write down your results with each setup (i.e. with each window, sample length, sampling rate, etc.), and look for results that agree across multiple setups. You will learn much about your data, and very likely find the answer to your problem.
You can do this with Matlab: http://www.mathworks.com/help/techdoc/ref/fft.html
Or with this online FFT spectrum analyzer: http://www.sooeet.com/math/fft.php
And don't forget to post your results here.
关键因素是频域中需要多少分辨率来区分不同的元音。
分辨率为
1 / T
,其中T
是 FFT 窗口的持续时间。如果采样时间为 62.5 ms,那么如果 FFT 的大小与采样间隔(1024 个样本)相同,则最大分辨率为 16 Hz(即每个 FFT 箱的宽度为 16 Hz)。
如果您采用较小的 FFT,那么显然您的分辨率会相应变差。
例如,512 点 FFT 的分辨率仅为 32 Hz。
The critical factor is how much resolution you need in the frequency domain to discriminate between different vowels.
Resolution is
1 / T
, whereT
is the duration of your FFT window.If you sample for 62.5 ms, then your maximum resolution is 16 Hz (i.e. each FFT bin is 16 Hz wide) if your FFT is the same size as your sampling interval (1024 samples).
If you go to a smaller FFT, then obviously your resolution will worsen proportionately.
e.g. a 512 point FFT would only have a resolution of 32 Hz.