在音频文件中查找音频样本（频谱图已存在）

发布于 2024-11-03 23:39:40 字数 788 浏览 2 评论 0原文

我正在尝试实现以下目标：

使用 Skype，拨打我的邮箱（有效）
输入密码并告诉邮箱我想要录制新的欢迎消息（有效）
现在，我的邮箱告诉我在蜂鸣声后录制新的欢迎消息
我想等待蜂鸣声，然后播放新消息（不起作用）

我如何尝试实现最后一点：

使用 FFT 和滑动窗口创建频谱图（可行）
为蜂鸣声创建“指纹”
搜索来自 Skype 的音频中的指纹

我面临的问题如下：
来自 Skype 的音频和参考嘟嘟声的 FFT 结果在数字意义上并不相同，即它们相似，但不相同，尽管嘟嘟声是从带有 Skype 音频录音的音频文件中提取的。下图左侧为 Skype 音频的嘟嘟声频谱图，右侧为参考嘟嘟声的频谱图。正如您所看到的，它们非常相似，但并不相同......
上传了图片 http://img27.imageshack.us/img27/6717/spectrogram.png< /a>

我不知道如何从这里继续。我应该对其进行平均，即将其分为列和行，并按照此处所述比较这些单元格的平均值？我不确定这是最好的方法，因为他已经说过，它对于短音频样本效果不太好，而且蜂鸣声的长度不到一秒......

有关如何进行的任何提示吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吹梦到西洲 2024-11-10 23:39:40

您应该确定峰值频率和持续时间（可能是该频率持续时间的最小功率（RMS 是最简单的测量）

蜂鸣声期间不存在其他峰值。

这应该很容易测量。为了使事情变得更加聪明（但对于这个简单的匹配任务可能完全没有必要），您可以断言更新

要比较完整的音频片段，您需要使用卷积算法。现成的库实现而不是您自己的库实现。

最常见的快速卷积算法通过循环卷积定理使用快速傅立叶变换 (FFT) 算法。具体来说，两个有限长度序列的循环卷积是通过对每个序列进行 FFT、逐点相乘，然后执行逆 FFT 来找到的。然后使用该技术结合零扩展和/或丢弃输出部分来有效地实现上面定义的类型的卷积。其他快速卷积算法，例如 Schönhage–Strassen 算法，在其他环中使用快速傅里叶变换。

维基百科将 http://freeverb3.sourceforge.net 列为开源候选者

编辑添加API 教程页面链接：http://freeverb3.sourceforge.net/tutorial_lib.shtml

其他资源：

http://en.wikipedia.org/wiki/Finite_impulse_response

http://dspguru.com/dsp/faqs/fir

debian 上现有的包含相关工具的软件包：

[brutefir - a software convolution engine][3]
jconvolver - Convolution reverb Engine for JACK

libzita-convolver2 - C++ library implementing a real-time convolution matrix
teem-apps - Tools to process and visualize scientific data and images - command line tools
teem-doc - Tools to process and visualize scientific data and images - documentation
libteem1 - Tools to process and visualize scientific data and images - runtime

yorick-yeti - utility plugin for the Yorick language

You should determine the peak frequency and duration (possibly a minumum power over that duration for the frequency (RMS being the simplest measure)

This should be easy enough to measure. To make things even more clever (but probably completely unnecessary for this simple matching task), you could assert the non-existance of other peaks during the window of the beep.

Update

To compare a complete audio fragment, you'll want to use a Convolution algorithm. I suggest using a ready made library implementation instead of rolling your own.

The most common fast convolution algorithms use fast Fourier transform (FFT) algorithms via the circular convolution theorem. Specifically, the circular convolution of two finite-length sequences is found by taking an FFT of each sequence, multiplying pointwise, and then performing an inverse FFT. Convolutions of the type defined above are then efficiently implemented using that technique in conjunction with zero-extension and/or discarding portions of the output. Other fast convolution algorithms, such as the Schönhage–Strassen algorithm, use fast Fourier transforms in other rings.

Wikipedia lists http://freeverb3.sourceforge.net as an open source candidate

Edit Added link to API tutorial page: http://freeverb3.sourceforge.net/tutorial_lib.shtml

Additional resources:

http://en.wikipedia.org/wiki/Finite_impulse_response

http://dspguru.com/dsp/faqs/fir

Existing packages with relevant tools on debian:

[brutefir - a software convolution engine][3]
jconvolver - Convolution reverb Engine for JACK

libzita-convolver2 - C++ library implementing a real-time convolution matrix
teem-apps - Tools to process and visualize scientific data and images - command line tools
teem-doc - Tools to process and visualize scientific data and images - documentation
libteem1 - Tools to process and visualize scientific data and images - runtime

yorick-yeti - utility plugin for the Yorick language

回复收藏 0 原文