识别 iOS 上的特定声音

发布于 2024-11-11 16:44:32 字数 308 浏览 6 评论 0原文

我希望能够识别 iOS 应用程序中的特定声音。我想它基本上会像语音识别一样工作，因为它相当模糊，但它只需要针对一种特定的声音。

我做了一些快速 FFT 的工作来识别超过特定阈值的特定频率，并且仅当它们单独存在时（即它们没有被其他频率包围），这样我就可以很容易地识别单个音调。我认为这只是其扩展，但与声音录制的 FFT 数据集进行比较，并比较音频长度上的 0.1 秒块。我还必须考虑振幅的变化、音调的变化和时间的变化。

谁能向我指出任何可以用来加速此过程的现有资源？我似乎找不到任何可用的东西。或者如果失败了，关于如何开始这样的事情有什么想法吗？

非常感谢

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

余厌 2024-11-18 16:44:32

从您的描述来看，并不完全清楚您想要做什么。
“特定”的声音是什么样的？背景噪音高吗？
具体的可识别特征是什么（例如音高、不和谐、音色......）？
您想将它与其他哪些“声音”进行比较？
您是否只想将任意声谱与“模板声音”进行匹配？
你的声音是打击乐的、旋律的、演讲的……吗？是长是短……？
您期望最佳辨别力的频率范围是多少？特征是否随时间变化？

不存在适用于所有情况的“通用”解决方案。语音识别本身相当复杂，对于可辨别频率不在 MEL 频段等抽象声音中效果不佳。

总之，您留下了太多悬而未决的问题，无法得到有用的答案。
根据少数信息，我可以提出的唯一建议如下：

For the template sound:
1) Extract spectral peak positions from the power spectrum
2) Measure the standard deviation around the peaks and construct a gaussian from it
3) save the gaussians for later classification

For unkown sounds:
1) Extract spectral peak positions
2) Project those points onto the saved gaussians which leaves you with z-scores of the peak positions
3) With the computed z-scores you should be able to classify your template sound

注意：这是一种非常粗略的方法，它根据声音最强大的频率来区分声音。使用高斯，它为最强大的频率留下了轻微变化的空间。

From your description it is not entirely clear what you want to do.
What is the "specific" sound like? Does it have high background noise?
Whats the specific recognizable feature (e.g. pitch, inhamonicity, timbre ...)?
Against which other "sounds" do you want to compare it?
Do you simply want to match an arbitrary sound spectrum against a "template sound"?
Is your sound percussive, melodic, speech, ...? Is it long, short ...?
Whats the frequency range you expect the best discriminability? Are the features invariant with time?

There is no "general" solution that works for everything. Speech recognition in itself is fairly complex and wont work well for abstract sounds whose discriminable frequencies are not in the e.g. MEL bands.

So in conclusion, you are leaving too many open questions to get a useful answer.
Only suggestion i can make based on the few informations is the following:

For the template sound:
1) Extract spectral peak positions from the power spectrum
2) Measure the standard deviation around the peaks and construct a gaussian from it
3) save the gaussians for later classification

For unkown sounds:
1) Extract spectral peak positions
2) Project those points onto the saved gaussians which leaves you with z-scores of the peak positions
3) With the computed z-scores you should be able to classify your template sound

Note: This is a very crude method which discriminates sounds according to their most powerful frequencies. Using the gaussians it leaves room for slight shifts in the most powerful frequencies.

回复收藏 0 原文

~没有更多了~