用于检测音高的共振算法
我一直在研究检测麦克风中唱出的音调的不同方法。
由于我想找出它与特定音级的共鸣程度,我想知道我是否可以做某种基于物理的共鸣算法。
如果您按住钢琴上的延音踏板,并向其唱出一个音调(如果您足够接近其现有音高之一),一个音符就会产生共鸣。
我希望能够模拟这种行为。但我该如何完成这个任务呢?谁能帮助我推动这一切?
I have been looking at different methods of detecting the pitch of a tone sung into the microphone.
Seeing as I want to find how closely it resonates with a particular pitch class, I wonder if I could do some sort of physics-based resonance algorithm.
If you hold down to sustain pedal on the piano, and sing a tone into it, (and if you are close enough to one of its existing pitches) a note will resonate sympathetically.
I would love to be able to model this behaviour. But how would I go about the task? Can anyone help me move this forward?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
看一下自相关函数。
Take a look at the autocorrelation function.
我发现的一个有趣的解决方案是将麦克风输入简单地输入到 Karplus Strong 算法中。
因此,Karplus Strong 通过以下方式模拟弹拨弦:
现在如果我们将麦克风流添加到此过程中,那么:
它实际上非常准确地模拟了用吉他唱歌。如果你把音调调准,它真的会哭。
但这种方法存在一个严重的问题:考虑由 100 个元素的缓冲区生成的音高,以及由 101 个元素的缓冲区生成的音高。无法在这两个值之间生成任何音高。我们仅限于一组离散的工作范围。虽然这对于低音来说非常准确(A2 的缓冲区长度约为 400),但我们走得越高,错误就越大:A7 的缓冲区长度约为 12.5。该错误可能超过半音。
我看不到任何解决这个问题的方法。我认为必须放弃这种方法。
One interesting solution I found is simply feeding the microphone input into a Karplus Strong algorithm.
So Karplus Strong simulates a plucked string by:
Now if we add the microphone stream into this process, so:
It actually simulates singing into a guitar really accurately. if you get your tone spot on, it truly wails.
But there is a severe problem with this approach: consider the pitch generated by a buffer of 100 elements, and that generated by a buffer of 101 elements. there is no way to generate any pitch in between these two values. we are limited to a discrete working set Of pitches. while this is going to be pretty accurate for low notes (A2 would have a buffer length of ~400), the higher we go the more the error: A7 would have a buffer length of ~12.5. That error is probably over a semitone.
I cannot see any way of countering this problem. I think the approach has to be dropped.
完全基于离散傅里叶变换 (DFT) 的算法有许多缺点。
一个问题是时间分辨率,因为 DFT 适用于窗口内的样本,因此您无法确定该窗口内的音调变化。
另一个问题是 DFT 的离散对数频率分辨率,这对于音调检测器来说可能不够好。毕竟,DFT 只能找到窗口大小整数波长的波。
稍微先进的算法可以做这样的事情:
通过计算样本数量,您可以获得与样本频率相匹配的音高分辨率。
如果您想要比采样频率更高的分辨率,您可以将函数(例如多项式)拟合到峰值点周围的样本。由于您已经抑制了其他频率,因此您应该能够做到这一点。
正如另一个答案所暗示的那样,您还可以使用自相关来查找信号内的最大信号重复。然而我应该说,实现一个好的自相关基音检测器并不是一件容易的事。在不知情的情况下,我会假设吉他调音器和类似的廉价电子产品将其算法基于带滤波器并结合计算峰值之间的样本距离。
An algorithm based entirely on a discrete fourier transform (DFT) has a number of drawbacks.
One problem is the temporal resolution, since the DFT works on samples within a window, you cannot determine pitch changes within that window.
Another problem is the discrete logarithmic frequency resolution of DFT which might not be good enough for a pitch detector. After all a DFT only finds waves with integer wavelengths of the window size.
A slightly advanced algorithm could do something like this:
By counting the number of samples you get a pitch resolution matching the sample frequency.
If you want even higher resolution than the sample frequency, you could fit a function, such as a polynomial, to the samples around the peak point. Since you have suppressed other frequencies, you should be able to do that.
As another answer suggests, you can also use auto-correlation to find maximum signal repetition within a signal. However I should say that it is not trivial to implement a good auto-correlation pitch detector. Without knowing it I would assume that guitar-tuners and similar cheap electronics base their algorithm on a band filter combined with counting the sample distance between peaks.
您可以使用阻尼谐波振荡器,以输入作为驱动力。选择振荡器的参数,使其谐振频率与您想要的频率相匹配。
您会在大多数有关力学的理论物理书籍中找到对阻尼谐振子的分析。
You can use a dampened harmonic oscillator with the input as driving force. Choose the parameters of the oscillator so that it's resonance frequency matches the frequency you want.
You'll find an analysis of the dampened harmonic oscillator in most theoretical physics books on mechanics.
我发现一种有用的方法是生成两个相距 90 度的参考波(我称之为“正弦”和“余弦”),并在相当短的时间内(例如 1)取输入波形与这些参考波的点积。 /60 秒)输入的延伸。这将为您提供一个有点嘈杂的指标,表明您的输入频率与参考波同相或异相的程度(使用两个参考波生成的值的平方和的平方根将为振幅)。使用较小的窗口大小时,您会注意到输出相当嘈杂,但如果您使用简单的 FIR 或 IIR 滤波器之类的东西对输出进行滤波,您可能会得到相当合理的结果。
一个不错的技巧是生成两个振幅数:对于第一个振幅数,通过两轮滤波运行正弦和余弦振幅,然后计算平方和。对于第二个,通过一轮滤波运行幅度,然后计算平方和,然后通过另一轮滤波运行。
两种幅度测量都会经历相同的延迟,但第一个测量将比第二个更具选择性;因此,您可以非常清楚地判断频率是“正确”还是有点偏离。使用这种方法,可以快速检测 DTMF 音调,同时拒绝甚至几赫兹的音调(偏离音调的音调在“松散”检测器上比紧密检测器更强烈地拾取)。
示例代码:
此处的代码对除数组下标之外的所有数学运算使用“double”。在实践中,用整数数学代替一些数学几乎肯定会更快。在具有浮点的机器上,我希望最好的方法是将相位保持为 32 位整数并使用约 4096 个“单个”正弦值的表(RAM 中的表大小越小,缓存一致性越好)表现)。我在定点(整数)DSP 上使用了与上述非常相似的代码,并取得了巨大成功; process_some_waves 中的正弦和余弦计算是在单独的“循环”中完成的,每个“循环”都被实现为带有“重复”前缀的单个指令。
One approach I've found to be helpful is to generate two reference waves 90 degrees apart (I call them "sine" and "cosine") and take the dot product of the input waveform with those reference waves over some fairly short (say 1/60 second) stretches of the input. That will give you a somewhat noisy indicator of how much of the input frequency you have that's in phase or out of phase with regard to your reference waves (the square root of the sum of the squares of the values generated using the two reference waves will be the amplitude). With a small window size, you'll notice that the output is rather noisy, but if you filter the output with something like a simple FIR or IIR filter you should probably get something pretty reasonable.
One nice trick is to generate two amplitude numbers: for the first one, run the sine and cosine amplitudes through two rounds of filtering, then compute the sum of the squares. For the second, run the amplitudes through one round of filtering, then compute the sum of the squares, and then run that through another round of filtering.
Both amplitude measurements will experience the same delay, but the first one will be much more selective than the second; you can thus tell very clearly whether a frequency is 'right on' or is a bit off. Using this approach, it's possible to detect DTMF tones quickly while rejecting tones that are even a few Hz off (off-pitch tones will pick up much more strongly on the 'loose' detector than the tight one).
Sample code:
The code here uses 'double' for all math other than array subscripting. In practice, it would almost certainly be faster to replace some of the math with integer maths. On machines with floating point, I would expect the best approach would be to keep the phase as a 32-bit integer and use a table of ~4096 'single' sine values (the smaller the table size in RAM, the better the cache coherency performance). I used code very much like the above on a fixed-point (integer) DSP with great success; the sine and cosine computations in process_some_waves were done in separate "loops", with each "loop" being realized as a single instruction with a "repeat" prefix.
我一直在阅读傅里叶分析。
基本上,如果你想从信号中提取频率 f,你只需输入正弦波频率 f,将其与原始信号相乘,然后积分。
如果原始信号不包含任何频率 f,你应该得到很多零。如果确实如此,那么您将得到该频率下信号中有多少能量的测量值。
尽管其背后有一些相当棘手的数学运算,但直观上是有意义的:只要看一下,频率为 f 的信号中的所有内容都会对正弦波产生相长干扰,留下残留;所有不在频率 f 处的东西本质上都可以被视为随机噪声(即,零以上的东西与零以下的东西数量几乎相同),当与我们的正弦波相乘时没有净效应。一切都取消了。
这与我钓鱼的目的相关。为了完成我上面的类比:要检查钢琴包含哪些音符,您只需踩下踏板并唱出上升的音调,每当发生交感共鸣时,您就可以记下钢琴在该频率上有一个音符。
当然,这并非没有缺点:如果您按住 C1(这次没有踏板)并演唱/弹奏 C2,C1 将以两倍基频共振,产生 C2 声音。
同样,弹奏 G2 会使其以其基频的三倍共振,等等
I have been reading up on Fourier analysis.
Basically if you wish to extract a frequency f out of a signal, you just throw in sine wave frequency f, multiply it with the original signal, and integrate
If the original signal didn't contain anything of frequency f, you should get pretty much zero. if it DOES, then you will get out a measure of how much energy in the signal is at that frequency.
Although there is some fairly tricky math behind it, it makes sense intuitively: just looking at it, everything in the signal that is at frequency f will interfere constructively with the sine wave leaving residue; everything not at frequency f could be essentially considered as random noise, (ie there is pretty much the same amount of stuff above zero as below) having no net effect when multiplied with our sine wave. everything cancels.
This correlates with what I was fishing for. To complete my analogy above: To check what notes a piano contains, you just hold the pedal down and sing a rising tone into it, and whenever a sympathetic resonance occurs you can jot down that the piano has a note at that frequency.
Of course this is not without its failings: If you hold down C1 (no pedal this time) and sing/play C2, C1 will resonate at twice its fundamental frequency producing a C2 sound.
Similarly playing G2 will make it resonate at three times its fundamental frequency etc