在音频文件中查找音频样本(频谱图已存在)

发布于 2024-11-03 23:39:40 字数 788 浏览 2 评论 0原文

我正在尝试实现以下目标:

  • 使用 Skype,拨打我的邮箱(有效)
  • 输入密码并告诉邮箱我想要录制新的欢迎消息(有效)
  • 现在,我的邮箱告诉我在蜂鸣声后录制新的欢迎消息
  • 我想等待蜂鸣声,然后播放新消息(不起作用)

我如何尝试实现最后一点:

  • 使用 FFT 和滑动窗口创建频谱图(可行)
  • 为蜂鸣声创建“指纹”
  • 搜索来自 Skype 的音频中的指纹

我面临的问题如下:
来自 Skype 的音频和参考嘟嘟声的 FFT 结果在数字意义上并不相同,即它们相似,但不相同,尽管嘟嘟声是从带有 Skype 音频录音的音频文件中提取的。下图左侧为 Skype 音频的嘟嘟声频谱图,右侧为参考嘟嘟声的频谱图。正如您所看到的,它们非常相似,但并不相同......
上传了图片 http://img27.imageshack.us/img27/6717/spectrogram.png< /a>

我不知道如何从这里继续。我应该对其进行平均,即将其分为列和行,并按照此处所述比较这些单元格的平均值?我不确定这是最好的方法,因为他已经说过,它对于短音频样本效果不太好,而且蜂鸣声的长度不到一秒......

有关如何进行的任何提示吗?

I am trying to achieve the following:

  • Using Skype, call my mailbox (works)
  • Enter password and tell the mailbox that I want to record a new welcome message (works)
  • Now, my mailbox tells me to record the new welcome message after the beep
  • I want to wait for the beep and then play the new message (doesn't work)

How I tried to achieve the last point:

  • Create a spectrogram using FFT and sliding windows (works)
  • Create a "finger print" for the beep
  • Search for that fingerprint in the audio that comes from skype

The problem I am facing is the following:
The result of the FFTs on the audio from skype and the reference beep are not the same in a digital sense, i.e. they are similar, but not the same, although the beep was extracted from an audio file with a recording of the skype audio. The following picture shows the spectrogram of the beep from the Skype audio on the left side and the spectrogram of the reference beep on the right side. As you can see, they are very similar, but not the same...
uploaded a picture http://img27.imageshack.us/img27/6717/spectrogram.png

I don't know, how to continue from here. Should I average it, i.e. divide it into column and rows and compare the averages of those cells as described here? I am not sure this is the best way, because he already states, that it doesn't work very good with short audio samples, and the beep is less than a second in length...

Any hints on how to proceed?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

吹梦到西洲 2024-11-10 23:39:40

您应该确定峰值频率和持续时间(可能是该频率持续时间的最小功率(RMS 是最简单的测量)

蜂鸣声期间不存在其他峰值。

这应该很容易测量。为了使事情变得更加聪明(但对于这个简单的匹配任务可能完全没有必要),您可以断言更新

要比较完整的音频片段,您需要使用卷积算法。现成的库实现而不是您自己的库实现。

最常见的快速卷积算法通过循环卷积定理使用快速傅立叶变换 (FFT) 算法。具体来说,两个有限长度序列的循环卷积是通过对每个序列进行 FFT、逐点相乘,然后执行逆 FFT 来找到的。然后使用该技术结合零扩展和/或丢弃输出部分来有效地实现上面定义的类型的卷积。其他快速卷积算法,例如 Schönhage–Strassen 算法,在其他环中使用快速傅里叶变换。

维基百科将 http://freeverb3.sourceforge.net 列为开源候选者

编辑 添加API 教程页面链接:http://freeverb3.sourceforge.net/tutorial_lib.shtml

其他资源:

http://en.wikipedia.org/wiki/Finite_impulse_response

http://dspguru.com/dsp/faqs/fir

debian 上现有的包含相关工具的软件包:

[brutefir - a software convolution engine][3]
jconvolver - Convolution reverb Engine for JACK

libzita-convolver2 - C++ library implementing a real-time convolution matrix
teem-apps - Tools to process and visualize scientific data and images - command line tools
teem-doc - Tools to process and visualize scientific data and images - documentation
libteem1 - Tools to process and visualize scientific data and images - runtime

yorick-yeti - utility plugin for the Yorick language

You should determine the peak frequency and duration (possibly a minumum power over that duration for the frequency (RMS being the simplest measure)

This should be easy enough to measure. To make things even more clever (but probably completely unnecessary for this simple matching task), you could assert the non-existance of other peaks during the window of the beep.

Update

To compare a complete audio fragment, you'll want to use a Convolution algorithm. I suggest using a ready made library implementation instead of rolling your own.

The most common fast convolution algorithms use fast Fourier transform (FFT) algorithms via the circular convolution theorem. Specifically, the circular convolution of two finite-length sequences is found by taking an FFT of each sequence, multiplying pointwise, and then performing an inverse FFT. Convolutions of the type defined above are then efficiently implemented using that technique in conjunction with zero-extension and/or discarding portions of the output. Other fast convolution algorithms, such as the Schönhage–Strassen algorithm, use fast Fourier transforms in other rings.

Wikipedia lists http://freeverb3.sourceforge.net as an open source candidate

Edit Added link to API tutorial page: http://freeverb3.sourceforge.net/tutorial_lib.shtml

Additional resources:

http://en.wikipedia.org/wiki/Finite_impulse_response

http://dspguru.com/dsp/faqs/fir

Existing packages with relevant tools on debian:

[brutefir - a software convolution engine][3]
jconvolver - Convolution reverb Engine for JACK

libzita-convolver2 - C++ library implementing a real-time convolution matrix
teem-apps - Tools to process and visualize scientific data and images - command line tools
teem-doc - Tools to process and visualize scientific data and images - documentation
libteem1 - Tools to process and visualize scientific data and images - runtime

yorick-yeti - utility plugin for the Yorick language
爱殇璃 2024-11-10 23:39:40

首先,我会在频率方向上对其进行一些平滑处理,以便频率的微小变化变得不那么相关。然后简单地获取每个频率并减去两个幅度。将差异平方并相加。也许首先对信号进行归一化,这样总幅度的差异就不再重要。然后将差异与阈值进行比较。

First I'd smooth it a bit in frequency-direction so that small variations in frequency become less relevant. Then simply take each frequency and subtract the two amplitudes. Square the differences and add them up. Perhaps normalize the signals first so differences in total amplitude don't matter. And then compare the difference to a threshold.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文