清理嘈杂的倒谱结果
我一直在研究 iPhone 上的简单频率检测设置。在存在谐波的情况下,使用 FFT 结果进行频域分析有些不可靠。我希望使用倒谱结果来帮助确定正在播放的基频。
我正在 AudioToolbox 框架中使用 AudioQueues,并使用 Accelerate 框架进行傅里叶变换。
我的过程正是维基百科倒谱文章中列出的实功率倒谱,具体来说:信号→FT→abs()→平方→log→FT→abs()→平方→功率倒谱。
我遇到的问题是倒谱结果非常嘈杂。我必须删除前 20 个值,因为与其他值相比,它们是天文数字。即使在“清理”数据之后,仍然存在巨大的变化 - 远远超出我对第一张图的预期。请参阅下图,了解频域和频域的可视化。 FFT 倒谱
当我在频域中看到如图所示的明显获胜者时,我希望看到频率域中的结果同样清晰。我玩过 A440,预计 bin 82 左右的震级最高。图表中的第三个峰值代表 bin 79,它足够接近。正如我所说,前 20 个左右的 bin 数量如此之大,以至于无法使用,我必须将它们从数据集中删除才能看到任何内容。倒谱数据的另一个奇怪的质量是偶数箱似乎比奇数箱高得多。以下是 77-86 的频率箱:
77: 151150.0313
78: 22385.92773
79: 298753.1875
80: 56532.72656
81: 114177.4766
82: 31222.88281
83: 4620.785156
84: 13382.5332
85: 83.668259
86: 1205.023193
我的问题是如何清理频域,以便我的倒谱域结果不那么疯狂。或者,帮助我更好地理解如何解释这些结果(如果它们符合倒谱分析中的预期)。我可以发布我正在使用的代码示例,但它主要使用 vDSP 调用,我不知道这会有多大帮助。
I've been working on a simple frequency detection setup on the iphone. Analyzing in the frequency domain using FFT results has been somewhat unreliable in the presence of harmonics. I was hoping to use Cepstrum results to help decide what fundamental frequency is playing.
I am working with AudioQueues in the AudioToolbox framework, and do the Fourier transforms using the Accelerate framework.
My process has been exactly what is listed on Wikipedia's Cepstrum article for the Real Power Cepstrum, specifically: signal → FT → abs() → square → log → FT → abs() → square → power cepstrum.
The problem I have is that the Cepstrum results are extremely noisy. I have to drop the first and last 20 values as they are astronomical compared to the other values. Even after "cleaning" the data, there is still a huge amount of variation - far more than I would expect given the first graph. See the pictures below for the visualizations of the frequency domain and the quefrency domain.
FFT
Cepstrum
When I see such a clear winner in the frequency domain as on that graph, I expect to see a similarly clear result in the quefrency domain. I played A440 and would expect bin 82 or so to have the highest magnitude. The third peak on the graph represents bin 79, which is close enough. As I said, the first 20 or so bins are so astronomical in magnitude as to be unusuable, and I had to delete them from the data set in order to see anything. Another odd quality of the cepstrum data is that the even bins seem to be much higher than the odd bins. Here are the frequency bins from 77-86:
77: 151150.0313
78: 22385.92773
79: 298753.1875
80: 56532.72656
81: 114177.4766
82: 31222.88281
83: 4620.785156
84: 13382.5332
85: 83.668259
86: 1205.023193
My question is how to clean up the frequency domain so that my Cepstrum domain results are not so wild. Alternately, help me better understand how to interpret these results if they are as one would expect in a Cepstrum analysis. I can post examples of the code I'm using, but it mostly uses vDSP calls and I don't know how helpful that would be.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
倒谱或倒谱分析是一种用于尝试将具有高泛音内容的信号分成两部分的技术。 DC 附近的部分表示所有泛音或语音共振峰的频谱包络,这可能对说话人或乐器识别有用。倒谱结果中的后续峰值表示激励器频率或多个频率(如果该频率产生足够的谐波泛音内容)。
由于倒谱通常是在没有任何(非矩形)窗口的情况下完成的,因此它甚至可以对干净的泛音序列产生 Sinc 响应,响应的宽度大致与泛音序列的长度或泛音数量成反比。当然,任何稍微不和谐的泛音(如在实际乐器中发现的)都会使倒谱结果更加混乱。因此,倒谱峰值可能只擅长给出基频的大致位置,这仍然是在进行频率估计时拒绝其他候选频率的有用结果。
“看起来干净”的倒谱可能是非常长的精确谐波泛音序列的结果,具有几乎平坦的频率响应,这可能不是现实生活信号中发现的。
A cepstrum, or cepstral analysis, is a technique used to try to separate a signal with high overtone content into two portions. The portion near DC represents the spectral envelope of all the overtones, or the speech formant, which might be useful for speaker or instrument recognition. Later peaks in the cepstrum result represents the exciter frequency or frequencies, if that frequency generates enough harmonic overtone content.
Since a cepstrum is usually done without any (non-rectangular) window, it can produce a Sinc response even to a clean overtone sequence, with the width of the response inversely roughly proportional to the length of the overtone sequence or the number of overtones. And, of course, any slightly inharmonic overtones (as found in actual musical instruments) will make the cepstrum results even messier. So a cepstrum peak may only be good at giving one the approximate location of the fundamental frequency, which could still be a useful result in rejecting other frequency candidates when doing frequency estimation.
A "clean looking" cepstrum might be the result of a very long sequence of exactly harmonic overtones with a nearly flat frequency response, which is perhaps not what is found in real life signals.
以下分析说明了倒谱在合成信号和真实信号上的性能。
首先我们检查合成信号。
下图显示了合成稳态 E2 音符,使用典型的近直流分量、82.4 Hz 的基波和 82.4 Hz 整数倍的总共 8 个谐波进行合成。合成正弦曲线被编程为生成 4096 个样本。
下图显示了用于合成 E2 音符倒谱计算的输入的特写。它是合成 E2 音符的 log(|FFT|^2) 输出。
下图显示了合成 E2 音符的倒谱。观察 12.36 处突出的非 DC 峰值。倒谱宽度为 1024(第二次 FFT 的输出),因此峰值对应于 1024/12.36 = 82.8 Hz,非常接近基波的实际 82.4 Hz。
现在我们检查真实世界的信号。
下图显示了真实原声吉他的 E2 音符的频谱。
下图显示了用于原声吉他 E2 倒谱计算的输入特写笔记。它是原声吉他 E2 音符的 log(|FFT|^2) 输出。
下图显示了原声吉他 E2 音符的倒谱。观察 542.8 处突出的非 DC 峰值。倒谱宽度为 32768(第二次 FFT 的输出),因此峰值对应于 32768/542.8 = 60.4 Hz,与基波的实际 82.4 Hz 相当远。
用于此分析的 E2 吉他音符录音是使用高质量麦克风以 44.1 KHz 采样的在录音室条件下,它基本上包含零背景噪音,并且没有其他乐器或声音。
这说明了使用倒谱分析来确定现实世界音频信号的音高所面临的重大挑战。
参考资料:
真实音频信号数据、合成信号生成、绘图、FFT 和倒谱分析均在此处完成:乐器倒谱
The following analysis illustrates Cepstrum's performance on synthetic and real-world signals.
First we examine a synthetic signal.
The plot below shows a synthetic steady-state E2 note, synthesized using a typical near-DC component, a fundamental at 82.4 Hz, and a total of 8 harmonics at integer multiples of 82.4 Hz. The synthetic sinusoid was programmed to generate 4096 samples.
The plot below shows a closeup of the input that was used for the Cepstrum calculation of the synthetic E2 note. It is the log(|FFT|^2) output from the synthetic E2 note.
The plot below shows the Cepstrum of the synthetic E2 note. Observe the prominent non-DC peak at 12.36. The Cepstrum width is 1024 (the output of the second FFT), therefore the peak corresponds to 1024/12.36 = 82.8 Hz which is very close to the actual 82.4 Hz of the fundamental.
Now we examine a real-world signal.
The plot below shows the spectrum of the E2 note from a real acoustic guitar.
The plot below shows a closeup of the input that was used for the Cepstrum calculation of the acoustic guitar's E2 note. It is the log(|FFT|^2) output from the acoustic guitar's E2 note.
The plot below shows the Cepstrum of the acoustic guitar's E2 note. Observe the prominent non-DC peak at 542.8. The Cepstrum width is 32768 (the output of the second FFT), therefore the peak corresponds to 32768/542.8 = 60.4 Hz which is fairly far from the actual 82.4 Hz of the fundamental.
The recording of the E2 guitar note used for this analysis was sampled at 44.1 KHz with a high quality microphone under studio conditions, it contains essentially zero background noise, and no other instruments or voices.
This illustrates the significant challenge of using Cepstral analysis for pitch determination in real-world audio signals.
References:
Real audio signal data, synthetic signal generation, plots, FFT, and Cepstral analysis were done here: Musical instrument cepstrum
如果我理解得很好,主要问题是从音频信号中检测频率。
当然你的意思是频谱中最强的频率,所以我建议使用
这个优秀的库 http://www.schmittmachine.com/dywapitchtrack.html
“算法是一种非常强大的小波算法,在 Eric Larson 和 Ross Maddox 的一篇论文中进行了描述:“UIUC 物理学的实时时域音调跟踪使用小波”。
希望这有帮助
If I understand well, the primary problem is to detect a frequency from an audio signal.
For sure you mean the strongest frequency in the spectrum so I suggest to use
this excellent library http://www.schmittmachine.com/dywapitchtrack.html
"The heart of the algorithm is a very powerful wavelet algorithm, described in a paper by Eric Larson and Ross Maddox : "Real-Time Time-Domain Pitch Tracking Using Wavelets" of UIUC Physics."
Hope this help