通过限制频率范围来提高 FFT 输出的频率分辨率?
我对 FFT 和信号处理很陌生,所以希望这个问题有意义和/或不愚蠢。
我想对现场音频信号进行频谱分析。我的目标是在响应能力和频率分辨率之间找到一个良好的权衡,这样我就可以近乎实时地猜测传入音频的音调。
根据我收集的傅立叶变换背后的数学知识,样本大小和频率分辨率之间存在固有的平衡。样本越大,分辨率越好。由于我试图最小化样本大小(以达到近乎实时的要求),这意味着我的分辨率会受到影响(输出缓冲区中的每个槽对应于很宽的频率范围,这是不希望的)。
然而,对于我的预期应用,我并不关心大部分频谱。我只需要窄频率范围的频谱信息,例如 100hz - 1600hz。有没有办法修改 FFT 实现,以便我可以提高频域输出的分辨率,同时保持输入缓冲区大小恒定(且较小)?换句话说,我可以用输出总带宽换取输出分辨率吗?如果是这样,这是如何完成的?
尽管我对数学的掌握很弱,但似乎用零填充输入缓冲区可能很有趣,不是吗?
预先感谢您可以提供的任何帮助。
I am new to FFTs and signal processing, so hopefully this question makes sense and/or isn't stupid.
I would like to perform spectrum analysis on a live audio signal. My goal is to find a good tradeoff between responsiveness and frequency resolution, such that I can take a guess at the pitch of the incoming audio in near-realtime.
From what I've gathered about the math behind the Fourier transform, there is an inherent balance between sample size and frequency resolution. The bigger the sample, the better resolution. Since I am trying to minimize sample size (to attain the near-realtime requirement), this means my resolution suffers (each slot in the output buffer corresponds to a wide frequency range, which is undesirable).
However, for my intended application, I don't care about most of the spectrum. I only need spectrum info for a narrow frequency range, say 100hz - 1600hz for example. Is there any way to modify an FFT implementation such that I can improve the resolution of the frequency domain output while keeping the input buffer size constant (and small)? In other words, can I trade output total bandwidth for output resolution? If so, how is this done?
Although I have a weak grasp of the math at best, it seems that padding the input buffer with zeros might be interesting, no?
Thanks in advance for any help you can offer.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您无法从任何地方获取额外的信息,但您可以通过重叠连续的 FFT 来减少延迟。对于实时功率谱估计,通常将连续输入窗口重叠 50%。
在样本之间插入零是另一个有用的技巧 - 它可以在输出箱中为您提供更明显的分辨率,但实际上您所做的只是插值,即没有获得额外的信息(当然)。除了上面的重叠建议之外,您可能会发现这种技术很有用。
You can't get additional information from nowhere, but you can reduce latency by overlapping successive FFTs. For real-time power spectrum estimates it's common to overlap successive input windows by 50%.
Inserting zeroes between samples is another useful trick - it gives you more apparent resolution in the output bins, but in reality all you are doing is interpolating, i.e. there is no additional information gained (of course). You might find this technique useful though, in addition to the overlap suggestion above.
正如马克所说,添加零会引入谐波(不需要的频率)。
另外,当您说“更大的样本”时,您是指更多的样本还是更高的频率采样率?较高的频率采样率将导致每单位时间有更多的样本,但似乎您的意思是在固定采样率下有更多的样本(即分析更大的时间块)。
您提到的上限频率为 1600Hz,因此您需要至少 3200Hz 的采样率,即。双倍的。
至于一次处理的时间:您需要权衡响应能力(10 秒的缓冲区需要 10 秒+处理时间才能得到结果)与减少噪音。较小的缓冲器更有可能拾取寄生噪声信号。
顺便说一句,一开始在频域中思考可能具有挑战性。我发现对此最好的不是我在大学参加的各种应用数学课程,而是晶体学课程。晶体衍射图仅仅是二维傅里叶变换。事实证明,在我的第一份工作中处理地震数据的 FFT 时,了解衍射图样在视觉上与晶体结构的关系非常有用。
As Mark says, adding zeros will introduce harmonics (unwanted frequencies).
Also when you say "bigger the sample", do you mean more samples, or a higher frequency sample rate? A higher frequency sample rate will result in more samples per unit of time, but it seemed like you meant more samples at a fixed sample rate (ie. Analyzing larger chunks of time).
You mentioned an upper frequency of 1600Hz, so you will need a sample rate of AT LEAST 3200Hz, ie. Double.
As for period of time to process at once: you will need to trade responsiveness (a 10 second buffer will take 10s + processing time before you get the result) vs. Reducing noise. Smaller buffers are more likely to pick up spurious noise signals.
As an aside, thinking in the frequency domain can be challenging at first. I found the best thing for this, were not the various applied maths classes that I took at univ, but a crystallography class. A crystal diffraction pattern is merely a 2d Fourier transform. Getting a handle on how a diffraction pattern visually relates to the crystal structure proved very useful for when it came to work with FFTs of seismic data in my first job.
我不认为有什么“技巧”可以超越 FFT。 “添加零”也可能意味着对信号进行过采样。为了消除谐波,必须对信号进行滤波(这肯定会引入额外的噪声)。然后您将进行更长的 FFT,但之后整体分辨率仍然相同。
此外,您的加窗函数将拓宽结果中的频率峰值。
OTOH,如果频率落在两个 FFT bin 之间,则可以通过查看相邻 bin 的比率来获得更好的分辨率:
http://www.tedknowlton.com/resume/FFT_Bin_Interp.html
但这并不适用于更复杂的信号(具有许多同时频率)。
如果您想知道是否存在某些频率,我会研究过滤器和相关性。
如果你想确定某个频率,你可以先将其过滤掉,然后检测过零。设计滤波器时有很多参数,因此滤波器长度只是导致特定滤波器(阶跃)响应时间的一个参数。您可以对多个频率执行此操作,一个接一个...
添加:一些直觉:
因为 FFT 足以重建,所以原则上有无限多个更高分辨率的频谱导致相同的样本向量,没有一个更正确。 bin 插值本质上是计算另一种(“更好的拟合”)表示,而不是快速-傅里叶变换的均匀间隔的 bin。
在离散、量化的情况下,例如 8 位,请考虑两个非常接近的频率。如果差异足够小,它们将产生相同样本,例如 256 个样本。但是查看更多样本(也许 1024),您会发现差异变得足够大,足以可见。
PS:过采样的过滤也可以在 FFT 之后通过简单地忽略较高的 bin 来完成。
I don't think there is a 'trick' to outperform the FFT. "Adding zeros" can also mean oversampling the signal. To get rid of the harmonics, the signal would have to be filtered (which will most certainly introduce extra noise). Then you would do a longer FFT, but after that the overall resolution will still be the same.
Also your windowing function will broaden the frequency peaks in your results.
OTOH, if a frequency falls between two FFT bins, it is possible to get a better resolution by looking at the ratio of the neighboring bins:
http://www.tedknowlton.com/resume/FFT_Bin_Interp.html
But this does not work for more complex signals (with many simultanuous frequencies).
If you want to know if certain frequencies are present, I would look into filters and correlation.
If you want to nail down one certain frequency, you can first filter it out and then detect the zero-crossings. There are many parameters when designing a filter, so filter length is only one parameter that leads to a certain filter (step-) response time. You can do this for more than one frequency, one after the other...
Addition: Some intuition:
Because the FFT is sufficient for reconstruction, there are principally infinitely many higher-resolution spectra that lead to the same sample vector, and none is more-correct. The bin interpolation essentially calculates another ('better fitting') representation than the evenly-spaced bins of the Fast-Fourier-Transform.
In the discrete, quantized case, e.g. 8-bit, think about two frequencies that are very close. If the difference is small enough, they would yield the same, say 256, samples. But looking at more samples (maybe 1024) you would notice that the difference becomes big enough to be seen.
PS: The filtering for oversampling can also be done after the FFT by simply ignoring the higher bins.
您可以以 1600 Hz(或稍高一些,例如 2k)对数据进行低通滤波,然后重新采样到较低的采样率(滤波器频率的两倍,例如 4k)以减少样本数量。然后使用零填充来提高频率分辨率。
You could low-pass filter the data at 1600 Hz (or somewhat higher, say 2k), and then resample to a lower samplerate (twice the filter frequency e.g. 4k) to reduce the number of samples. Then use zero-padding to increase the frequency resolution.
你所说的目标与你的问题不相容。音频的音调与解析的频率峰值不同。请阅读有关声音和音乐音高估计的大量文献(这适用于具有感知音高的许多其他类型的声音)。自适应/增量/滑动时域技术可以为您提供比基于频域块的技术更低的延迟。
音频样本向量的零填充几乎与频域数据的插值相同。如果噪音或附近干扰很少,您可能会找到更准确(更高“分辨率”)的频率峰值位置。但您不会更好地抑制附近的光谱峰值(分离分辨率)或噪声。
在 FFT 之前对数据进行加窗(von Hann 等)可能有助于消除由附近但非 bin 或 2-bin 相邻频率引起的一些噪声。
补充:除非您的后采样低通滤波器近乎完美且相位线性,否则您实际上可能会在所需频带边缘附近失去频率分辨率。过滤不会将任何实际信息添加到感兴趣的频带中,因此对提高“分辨率”没有帮助。加窗更有可能减少来自其他频率的干扰。
Your stated goal is incompatible with your question. The pitch of audio is not same as the resolved frequency peak. Please read the vast literature on vocal and musical pitch estimation (which applies to many other types of sounds that have a perceived pitch). Adaptive/incremental/sliding time domain techniques may give you a lower latency than frequency domain block based techniques.
Zero padding of the audio sample vector is nearly identical with interpolation of the frequency domain data. If there is little noise or nearby interference, you may find a more accurate (higher "resolution") frequency peak position. But you won't get any better rejection of nearby spectral peaks (separation resolution) or noise.
Windowing the data (von Hann, etc.) before your FFT may help remove some of the noise caused by nearby, but not-bin or 2-bin adjacent, frequencies.
Added: unless your after-sampling low-pass filter is nearly perfect and phase linear, you could actually lose frequency resolution near the edges of your desired frequency band. Filtering doesn't add any actual information into the band of interest, so is of no help in increasing "resolution". Windowing is more likely to reduce interference from other frequencies.
您可能想了解一下压缩感知。您可以对本质上是预压缩信号的信号进行采样(并存储),以便稍后重建。只要信号稀疏度很高(您的情况可能就是这种情况),香农-奈奎斯特约束就可以稍微弯曲。缺点是重新创建原始信号的后处理可能需要大量计算时间。此外,您可能必须开发自己的设备驱动程序来管理您用来采样信号的任何硬件,因为工厂驱动程序可能假设您有兴趣遵守奈奎斯特-香农约束。更多信息可以在此处找到。
You might want to look into Compressed Sensing. You can sample (and store) what is essentially a pre-compressed signal which you can reconstruct later. As long as the signal sparsity is high (which will probably be the case in your situation) the Shanon-Nyquist constraint can be bent somewhat. The downside is that post-processing to recreate the original signal can be computationally time-intensive. Also, you're probably going to have to develop your own device drivers to manage whatever hardware you're using to sample your signal since the factory drivers probably assume you're interested in adhering to the Nyquist-Shannon constraints. More information can be found here.