在音频文件中查找音频样本(频谱图已存在)
我正在尝试实现以下目标:
- 使用 Skype,拨打我的邮箱(有效)
- 输入密码并告诉邮箱我想要录制新的欢迎消息(有效)
- 现在,我的邮箱告诉我在蜂鸣声后录制新的欢迎消息
- 我想等待蜂鸣声,然后播放新消息(不起作用)
我如何尝试实现最后一点:
- 使用 FFT 和滑动窗口创建频谱图(可行)
- 为蜂鸣声创建“指纹”
- 搜索来自 Skype 的音频中的指纹
我面临的问题如下:
来自 Skype 的音频和参考嘟嘟声的 FFT 结果在数字意义上并不相同,即它们相似,但不相同,尽管嘟嘟声是从带有 Skype 音频录音的音频文件中提取的。下图左侧为 Skype 音频的嘟嘟声频谱图,右侧为参考嘟嘟声的频谱图。正如您所看到的,它们非常相似,但并不相同......
上传了图片 http://img27.imageshack.us/img27/6717/spectrogram.png< /a>
我不知道如何从这里继续。我应该对其进行平均,即将其分为列和行,并按照此处所述比较这些单元格的平均值?我不确定这是最好的方法,因为他已经说过,它对于短音频样本效果不太好,而且蜂鸣声的长度不到一秒......
有关如何进行的任何提示吗?
I am trying to achieve the following:
- Using Skype, call my mailbox (works)
- Enter password and tell the mailbox that I want to record a new welcome message (works)
- Now, my mailbox tells me to record the new welcome message after the beep
- I want to wait for the beep and then play the new message (doesn't work)
How I tried to achieve the last point:
- Create a spectrogram using FFT and sliding windows (works)
- Create a "finger print" for the beep
- Search for that fingerprint in the audio that comes from skype
The problem I am facing is the following:
The result of the FFTs on the audio from skype and the reference beep are not the same in a digital sense, i.e. they are similar, but not the same, although the beep was extracted from an audio file with a recording of the skype audio. The following picture shows the spectrogram of the beep from the Skype audio on the left side and the spectrogram of the reference beep on the right side. As you can see, they are very similar, but not the same...
uploaded a picture http://img27.imageshack.us/img27/6717/spectrogram.png
I don't know, how to continue from here. Should I average it, i.e. divide it into column and rows and compare the averages of those cells as described here? I am not sure this is the best way, because he already states, that it doesn't work very good with short audio samples, and the beep is less than a second in length...
Any hints on how to proceed?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您应该确定峰值频率和持续时间(可能是该频率持续时间的最小功率(RMS 是最简单的测量)
蜂鸣声期间不存在其他峰值。
这应该很容易测量。为了使事情变得更加聪明(但对于这个简单的匹配任务可能完全没有必要),您可以断言更新
要比较完整的音频片段,您需要使用卷积算法。现成的库实现而不是您自己的库实现。
维基百科将 http://freeverb3.sourceforge.net 列为开源候选者
编辑 添加API 教程页面链接:http://freeverb3.sourceforge.net/tutorial_lib.shtml
其他资源:
http://en.wikipedia.org/wiki/Finite_impulse_response
http://dspguru.com/dsp/faqs/fir
debian 上现有的包含相关工具的软件包:
You should determine the peak frequency and duration (possibly a minumum power over that duration for the frequency (RMS being the simplest measure)
This should be easy enough to measure. To make things even more clever (but probably completely unnecessary for this simple matching task), you could assert the non-existance of other peaks during the window of the beep.
Update
To compare a complete audio fragment, you'll want to use a Convolution algorithm. I suggest using a ready made library implementation instead of rolling your own.
Wikipedia lists http://freeverb3.sourceforge.net as an open source candidate
Edit Added link to API tutorial page: http://freeverb3.sourceforge.net/tutorial_lib.shtml
Additional resources:
http://en.wikipedia.org/wiki/Finite_impulse_response
http://dspguru.com/dsp/faqs/fir
Existing packages with relevant tools on debian:
首先,我会在频率方向上对其进行一些平滑处理,以便频率的微小变化变得不那么相关。然后简单地获取每个频率并减去两个幅度。将差异平方并相加。也许首先对信号进行归一化,这样总幅度的差异就不再重要。然后将差异与阈值进行比较。
First I'd smooth it a bit in frequency-direction so that small variations in frequency become less relevant. Then simply take each frequency and subtract the two amplitudes. Square the differences and add them up. Perhaps normalize the signals first so differences in total amplitude don't matter. And then compare the difference to a threshold.