识别 iOS 上的特定声音

发布于 2024-11-11 16:44:32 字数 308 浏览 2 评论 0原文

我希望能够识别 iOS 应用程序中的特定声音。我想它基本上会像语音识别一样工作,因为它相当模糊,但它只需要针对一种特定的声音。

我做了一些快速 FFT 的工作来识别超过特定阈值的特定频率,并且仅当它们单独存在时(即它们没有被其他频率包围),这样我就可以很容易地识别单个音调。我认为这只是其扩展,但与声音录制的 FFT 数据集进行比较,并比较音频长度上的 0.1 秒块。我还必须考虑振幅的变化、音调的变化和时间的变化。

谁能向我指出任何可以用来加速此过程的现有资源?我似乎找不到任何可用的东西。或者如果失败了,关于如何开始这样的事情有什么想法吗?

非常感谢

I'd like to be able to recognise a specific sound in an iOS application. I guess it would basically work like speech recognition in that it's fairly fuzzy, but it would only have to be for 1 specific sound.

I've done some quick FFT stuff to identify specific frequencies over a certain threshold and only when they're solo (ie, they're not surrounded by other frequencies) so I can identify individual tones pretty easily. I'm thinking it's just an extension of this, but comparing to an FFT data set of a recording of the sound, and compare say 0.1 second chunks over the length of the audio. And I would also have to account for variation in amplitude, a little in pitch and a little in time.

Can anyone point me to any pre-existing source that I could use to speed this process along? I can't seem to find anything usable. Or failing that, any ideas on how to get started on something like this?

Thanks very much

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

余厌 2024-11-18 16:44:32

从您的描述来看,并不完全清楚您想要做什么。
“特定”的声音是什么样的?背景噪音高吗?
具体的可识别特征是什么(例如音高、不和谐、音色......)?
您想将它与其他哪些“声音”进行比较?
您是否只想将任意声谱与“模板声音”进行匹配?
你的声音是打击乐的、旋律的、演讲的……吗?是长是短……?
您期望最佳辨别力的频率范围是多少?特征是否随时间变化?

不存在适用于所有情况的“通用”解决方案。语音识别本身相当复杂,对于可辨别频率不在 MEL 频段等抽象声音中效果不佳。

总之,您留下了太多悬而未决的问题,无法得到有用的答案。
根据少数信息,我可以提出的唯一建议如下:

For the template sound:
1) Extract spectral peak positions from the power spectrum
2) Measure the standard deviation around the peaks and construct a gaussian from it
3) save the gaussians for later classification

For unkown sounds:
1) Extract spectral peak positions
2) Project those points onto the saved gaussians which leaves you with z-scores of the peak positions
3) With the computed z-scores you should be able to classify your template sound 

注意:这是一种非常粗略的方法,它根据声音最强大的频率来区分声音。使用高斯,它为最强大的频率留下了轻微变化的空间。

From your description it is not entirely clear what you want to do.
What is the "specific" sound like? Does it have high background noise?
Whats the specific recognizable feature (e.g. pitch, inhamonicity, timbre ...)?
Against which other "sounds" do you want to compare it?
Do you simply want to match an arbitrary sound spectrum against a "template sound"?
Is your sound percussive, melodic, speech, ...? Is it long, short ...?
Whats the frequency range you expect the best discriminability? Are the features invariant with time?

There is no "general" solution that works for everything. Speech recognition in itself is fairly complex and wont work well for abstract sounds whose discriminable frequencies are not in the e.g. MEL bands.

So in conclusion, you are leaving too many open questions to get a useful answer.
Only suggestion i can make based on the few informations is the following:

For the template sound:
1) Extract spectral peak positions from the power spectrum
2) Measure the standard deviation around the peaks and construct a gaussian from it
3) save the gaussians for later classification

For unkown sounds:
1) Extract spectral peak positions
2) Project those points onto the saved gaussians which leaves you with z-scores of the peak positions
3) With the computed z-scores you should be able to classify your template sound 

Note: This is a very crude method which discriminates sounds according to their most powerful frequencies. Using the gaussians it leaves room for slight shifts in the most powerful frequencies.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文