在混乱的 FFT 中查找相关峰值
我的 FFT 输出如下所示:
523 Hz 是最大值。然而,作为一个混乱的 FFT,有很多小峰紧邻大峰。然而,它们是无关紧要的,而显示的峰值却无关紧要。我可以使用哪些算法来提取重要的 FFT 最大值?即,不只是随机峰值出现在“真实”峰值附近吗?也许我可以将某种滤波器应用于此 FFT 输出?
编辑:这个的背景是我试图获取一击声音样本(就像有人按下钢琴上的琴键)并提取最响亮的分音。在下图中,2000 Hz 以上的峰值很重要,因为它们是给定声音(恰好是一种铃声)的离散部分。然而,分散在 523 附近的峰值似乎只是人为因素,我想忽略它们。
I have FFT outputs that look like this:
At 523 Hz is the maximum value. However, being a messy FFT, there are lots of little peaks that are right near the large peaks. However, they're irrelevant, whereas the peaks shown aren't. Are the any algorithms I can use to extract the maxima of this FFT that matter; I.E., aren't just random peaks cropping up near "real" peaks? Perhaps there is some sort of filter I can apply to this FFT output?
EDIT: The context of this is that I am trying to take one-hit sound samples (like someone pressing a key on a piano) and extract the loudest partials. In the image below, the peaks above 2000 Hz are important, because they are discrete partials of the given sound (which happens to be a sort of bell). However, the peaks that are scattered about right near 523 seem to be just artifacts, and I want to ignore them.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果峰值很宽,则可能表明峰值频率已被调制(AM、FM 或两者),或者实际上是多个频谱峰值的组合,每个频谱峰值本身都可能受到调制。
例如,钢琴音符可能是琴槌敲击最多 3 根琴弦的结果,这些琴弦的调音都略有不同,并且当它们通过钢琴框架在琴弦之间交换能量时,它们都可以进行调制。随着拨弦形状失真的平滑和衰减,吉他弦可以改变频率。钟声在被撞击后会改变形状,这可以调节它们的频谱。等等。
如果声音本身“混乱”,那么在应用任何类型的平滑或边带抑制滤波器之前,您需要对“真实”峰值有一个很好的定义。例如,所有这些“混乱”可能是使铃声听起来像真正的铃声而不是电子正弦波发生器的原因之一。
If the peak is broad, it could indicate that the peak frequency is modulated (AM, FM or both), or is actually a composite of several spectral peaks, themselves each potentially modulated.
For instance, a piano note may be the result of the hammer hitting up to 3 strings that are all tuned just a tiny fraction differently, and they all can modulate as they exchange energy between strings though the piano frame. Guitar strings can change frequency as the pluck shape distortion smooths out and decays. Bells change shape after they are hit, which can modulate their spectrum. Etc.
If the sound itself is "messy" then you need a good definition of what you mean by the "real" peak, before applying any sort of smoothing or side-band rejection filter. e.g. All that "messiness" may be part of what makes a bell sound like a real bell instead of an electronic sinewave generator.
尝试将 FFT(将其视为信号)与矩形脉冲 (
pulse = Ones(1:20)/20;
) 进行卷积。这可能会消除其中一些。考虑到这一点,您的最大值将向右移动 10 个频率档。您基本上会集成您的信号。类似的技术也用于心跳识别的 Pan-Tompkins 算法。Try convolving your FFT (treating it as a signal) with a rectangular pulse(
pulse = ones(1:20)/20;
). This might get rid of some of them. Your maxima will be shifted by 10 frequency bins to teh right, to take that into account. You would basically be integrating your signal. Similar techniques are used in Pan-Tompkins algorithm for heart beat identification.我曾经研究过类似的问题,并选择使用 savitsky-golay 滤波器来平滑频谱数据。我可以获得一些显着的峰值,并且它并没有对整体频谱造成太大干扰。
但我对 hotpaw2 提醒你的内容有疑问,我失去了重要的特征,也失去了“混乱”,所以我真心建议你听听他的声音。但是,如果您认为不会有问题,我认为 savitsky-golay 可以提供帮助。
I worked on a similar problem once, and choosed to use savitsky-golay filters for smoothing the spectrum data. I could get some significant peaks, and it didn't messed too much with the overall spectrum.
But I Had a problem with what hotpaw2 is alerting you, I have lost important characteristics along with the lost of "messiness", so I truly recommend you hear him. But, if you think you won't have a problem with that, I think savitsky-golay can help.
有一些非 FFT 方法可用于创建时域数据的频域表示,这些方法更适合噪声数据集,例如 Max-ent 重建。
对于噪声时间序列数据,最大实体重建将能够非常有效地将真实峰值与噪声区分开(无需添加任何伪影或其他修改来抑制噪声)。
Max ent 的工作原理是“猜测”时域频谱的 FFT,然后进行 IFT,并将结果与“实际”时间序列数据进行迭代比较。 maxent 的最终输出是频域频谱(如上面所示)。
我相信 Java 中有用于一维光谱的实现,但我从未使用过。
There are non-FFT methods for creating a frequency domain representation of time domain data which are better for noisy data sets, like Max-ent recontruction.
For noisy time-series data, a max-ent reconstruction will be capable of distinguising true peaks from noise very effectively (without adding any artifacts or other modifications to suppress noise).
Max ent works by "guessing" an FFT for a time domain specturm, and then doing an IFT, and comparing the results with the "actual" time-series data, iteratively. The final output of maxent is a frequency domain spectrum (like the one you show above).
There are implementations in java i believe for 1-d spectra, but I have never used one.