从潜在谐波确定基频的算法
我正在尝试从声源中提取基频。也许有人对着麦克风唱 A3,所以我想检测 ~ 110Hz
我的方法是:
- FFT 1024 浮点
- 使用每个 bin 的相位来准确确定其精确频率
- 确定峰值(通常为 50 左右)
- 将它们排序为最响亮的第一个
(峰值[0].power=1063.343750,.freq=2032.715088
(峰值[1].power=1047.764893,.freq=3070.605225
(峰值[2].power=1014.986877,.freq=5925.878418
(峰值[3].power=1011.707825,.freq=6963.769043
(峰值[4].power=1009.152954,.freq=4022.363037
(峰值[5].power=995.199585,.freq=4974.120605
(峰值[6].power=987.243713,.freq=8087.792480
(Peak[7].power=533.514832, .freq=908.691833
- (MARKER1) 从最响亮的开始,并将其与所有剩余的峰值相匹配,所以如果我有 N 个峰值,此时我将
- 检查 N-1 个峰值对每个峰值对的谐波度;即它与某个分数 a/b 有多接近,即我们可以找到 b<20 使得 |peakA.freq/peakB.freq - a/b|将匹配最多 20 次谐波)
我们现在有了一份被视为彼此谐波的峰值的细化列表
谐波峰值对:(0,1)=2/3,误差:0.00468 => f0@1019.946289
谐波峰值对:(0,2)=1/3,误差:0.00969 => f0 @ 2004.003906
谐波峰值对:(0,3)=2/7,误差:0.00618 => f0@1005.590820
谐波峰值对:(0,4)=1/2,误差:0.00535 => f0 @ 2021.948242
谐波峰值对:(0,5)=2/5,误差:0.00866 => f0@1005.590820
谐波峰值对:(0,6)=1/4,误差:0.00133 => f0 @ 2027.331543
谐波峰值对:(0,7)=9/4,误差:0.01303 => f0 @ 226.515106
我的问题是:如何设计一种算法来正确识别上述基波为 ~1000Hz?
决不能保证 ~1000 处的值集中度高于 ~2000 或 ~3000 等处的值。甚至不能保证 ~1000 处会有任何条目。我们可以有 ~5000 x 1 个条目、~4000 x 3 个条目、~3000 x 2 个条目以及一些浮动的虚假值,例如上面列表中的 226。
我想我可以再次重复这个过程,剔除与列表中的其余部分不“和谐”的建议基本原理。这至少会摆脱虚假的价值观......
可能我什至没有问正确的问题。也许这整个方法很糟糕。但我认为选择最强的峰值并提取与该峰值相关的一组谐波是有意义的。
理论上,应该生成比率负载,假设原始最强峰值是三次谐波,那么这组峰值应包含 3/1 3/2 3/3 3/4 3/5 3/6 3/7 等...尽管有些可能会丢失。
实际上,我有一种感觉,它总是具有最大强度的基音或第一和声。但我不知道我是否可以依靠这个......
这么多因素,它让我头晕。对于这样一个混乱的问题,我提前表示歉意。希望我死后能把它整理好。
I am attempting to extract a fundamental frequency from a sound source. maybe someone is singing A3 into the microphone, so I want to be detecting ~ 110Hz
my approach is:
- FFT 1024 floats
- use the phase of each bin to accurately determine its precise frequency
- determine peaks (typically 50 or so)
- order them with the loudest first
(Peak[0].power=1063.343750, .freq=2032.715088
(Peak[1].power=1047.764893, .freq=3070.605225
(Peak[2].power=1014.986877, .freq=5925.878418
(Peak[3].power=1011.707825, .freq=6963.769043
(Peak[4].power=1009.152954, .freq=4022.363037
(Peak[5].power=995.199585, .freq=4974.120605
(Peak[6].power=987.243713, .freq=8087.792480
(Peak[7].power=533.514832, .freq=908.691833
- (MARKER1) start with the loudest, and match it against all remaining peaks, so if I had N peaks, I will have at this point N-1 peak-pairs
- examine each peak-pair for harmonicity; ie how close is it to some fraction a/b, ie can we find a/b with b<20 such that |peakA.freq/peakB.freq - a/b| < 0.01 (this would match harmonics up to the 20th one)
we now have a refined list of peaks that are considered harmonic with one another
Harmonic PeakPair: (0,1)=2/3, error:0.00468 => f0 @ 1019.946289
Harmonic PeakPair: (0,2)=1/3, error:0.00969 => f0 @ 2004.003906
Harmonic PeakPair: (0,3)=2/7, error:0.00618 => f0 @ 1005.590820
Harmonic PeakPair: (0,4)=1/2, error:0.00535 => f0 @ 2021.948242
Harmonic PeakPair: (0,5)=2/5, error:0.00866 => f0 @ 1005.590820
Harmonic PeakPair: (0,6)=1/4, error:0.00133 => f0 @ 2027.331543
Harmonic PeakPair: (0,7)=9/4, error:0.01303 => f0 @ 226.515106
My question is: how can I devise an algorithm that will correctly identify the above fundamental as ~1000Hz?
It is by no means guaranteed that there will be a higher concentration of values at ~1000 than at ~2000 or ~3000 etc. it isn't even guaranteed that there will be any entry ~1000. we could have ~5000 x one entry, ~4000 x three entries, ~3000 x 2 entries, and a couple of bogus values floating around, like the 226 in the above list.
I guess I can repeat the procedure again, weeding out suggested fundamentals which are not 'harmonic' with the rest of the list. this would at least get rid of the bogus values...
it may be that I'm not even asking the right question. Maybe this whole approach sucks. But I think it makes sense to pick the strongest peak and extract a set of harmonics associated with that peak.
in theory that should generate a load of ratios, say if how original strongest peak was the third harmonic, then this set of peaks should contain 3/1 3/2 3/3 3/4 3/5 3/6 3/7 etc ... although some may be missing.
realistically I have a feeling it's always going to be either a fundamental or the first harmonic that has the greatest strength. but I don't know if I can rely on this...
so many factors, it is making my head swim. I apologise in advance for such a messy question. Hopefully I can tidy it up posthumously.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
倒谱(或倒谱分析)和谐波积谱是两种经过深入研究的算法,可根据泛音序列估计激励器频率。
如果泛音序列间隔适当,则倒谱(FFT 峰值对数的 FFT)可能有助于估计频率间隔的周期,然后可以使用该周期来估计频率。
谐波乘积频谱基本上通过按多个低整数比抽取频谱并将它们重叠来将频谱峰值与其自身的 n 个多个副本进行比较。
A Cepstum (or Cepstral analysis) and Harmonic Product Spectrum are two well studied algorithms that estimate the exciter frequency from an overtone series.
If the sequences of overtones are appropriately spaced, than a Cepstrum (FFT of the log of the FFT peaks) may be useful in estimating the period of the frequency spacing, which can then be used to estimate the frequency.
The Harmonic Product Spectrum basically compares the spectral peaks with nth multiple copies of themselves by decimating the spectrum by multiple low integer ratios and overlapping them.
您可以通过以下链接查看有关语音识别的文章。
文章:语音识别的相空间点分布参数(完整版需要订阅文本)
You can go through following link for an article on speech recognition.
Article: Phase Space Point Disribution Parameter for Speech Recognition (subscription required for full text)
我重新表述了这个问题,并在这里提供了答案:如何接收一组数字,例如 {301,102,99,202,198,103} 并扔掉 ~100?
我已经研究了几种方法,并且这比我发现的任何其他内容都要简洁得多。我已经测试过了,效果很好。
I have rephrased the question, and provided an answer here: How to take in a set of numbers like {301,102,99,202,198,103} and throw out ~100?
I had looked at several approaches, and this is considerably more succinct than anything else I've found. I have tested it and it works very well.