从连续麦克风流中实时识别非语音、非音乐声音
我希望记录与特定声音相对应的事件,例如车门关上的声音,或者烤面包机吐司的声音。
该系统需要比“噪音检测器”更复杂;它需要能够区分特定的声音和其他响亮的噪音。
识别不需要是零延迟,但处理器需要跟上来自始终打开的麦克风的连续传入数据流。
- 这项任务与语音识别有显着不同吗?或者我可以利用语音识别库/工具包来识别这些非语音吗?
- 鉴于我只需要匹配一种声音(而不是在声音库之间进行匹配)的要求,我可以做任何特殊的优化吗?
此答案表示匹配的过滤器是合适的,但我对细节很模糊。由于目标声音的变化,我认为目标声音样本和麦克风流之间的音频波形数据的简单互相关不会有效。
我的问题也类似于this,但没有得到太多注意力。
I'm looking to log events corresponding to a specific sound, such as a car door slamming, or perhaps a toaster ejecting toast.
The system needs to be more sophisticated than a "loud noise detector"; it needs to be able to distinguish that specific sound from other loud noises.
The identification need not be zero-latency, but the processor needs to keep up with a continuous stream of incoming data from a microphone that is always on.
- Is this task significantly different than speech recognition, or could I make use of speech recognition libraries/toolkits to identify these non-speech sounds?
- Given the requirement that I only need to match one sound (as opposed to matching among a library of sounds), are there any special optimizations I can do?
This answer indicates that a matched filter would be appropriate, but I am hazy on the details. I don't believe a simple cross-correlation on the audio waveform data between a sample of the target sound and the microphone stream would be effective, due to variations in the target sound.
My question is also similar to this, which didn't get much attention.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我发现了一篇关于该主题的有趣论文
频率矢量主成分分析,作者:Huadong Wu、Mel Siegel 和 Pradeep Khosla(IEEE Transactions on Instrumentation andMeasurement,第 48 卷,第 5 期,1999 年 10 月)
)
它也应该适用于您的应用程序,即使不比车辆声音更好。
分析训练数据时,它...
执行 频率向量的主成分分析
然后,为了对声音进行分类,它...
I found an interesting paper on the subject
Frequency Vector Principal Component Analysis by Huadong Wu, Mel Siegel, and Pradeep Khosla (IEEE Transactions on Instrumentation and Measurement, Vol. 48, No. 5, October 1999
)
It should also work for your application, if not better than for vehicle sounds.
When analyzing the training data, it...
Does a Principal Component Analysis on the frequency vectors
Then to classify a sound, it...
本博士论文,非语音环境用于自主监视的声音分类系统,作者:Cowling (2004),有实验结果关于音频特征提取和分类的不同技术。他使用环境声音,例如叮当作响的钥匙声和脚步声,能够达到 70% 的准确率:
如果你限制自己只听一种声音,或许你能达到更高的识别率?
作者还提到,在语音识别(学习矢量量化和神经网络)方面效果很好的技术在环境声音方面效果不佳。
我还在这里找到了一篇更新的文章:检测语义视频搜索的音频事件,作者:Bugalho 等人。 (2009),他们检测电影中的声音事件(如枪声、爆炸等)。
我没有这方面的经验。我只是因为你的问题引起了我的兴趣而偶然发现了这份材料。我将我的发现发布在这里,希望对您的研究有所帮助。
This doctoral thesis, Non-Speech Environmental Sound Classification System for Autonomous Surveillance, by Cowling (2004), has experimental results on different techniques for audio feature extraction, as well as classification. He uses environmental sounds such as jangling keys and footsteps, and was able to achieve an accuracy of 70%:
If you limit yourself to one sound, perhaps you might be able to achieve a higher recognition rate?
The author also mentions that techniques that work fairly well with speech recognition (learning vector quantization and neural networks) don't work so well with environmental sounds.
I have also found a more recent article here: Detecting Audio Events for Semantic Video Search, by Bugalho et al. (2009), where they detect sound events in movies (like gun shots, explosions, etc).
I have no experience in this area. I have merely stumbled upon this material as a result of your question piquing my interest. I'm posting my finds here in the hope that it helps with your research.