使用开放耳朵进行语音识别时准确度非常低
我在我的应用程序中使用张开耳朵进行语音识别。主要关注的是准确性。在安静的环境中,准确率约为 50%,但在嘈杂的环境中,情况会变得更糟。几乎没有什么能被正确识别。我目前使用的字典文件大约有300个单词。我应该注意哪些方面来提高准确性?到目前为止我还没有对此进行任何调整。
I'm using open ears for speech recognition in my app. The major concern is the accuracy. In a quiet environment there is about 50% accuracy, but things get worse in a noisy environment. Almost nothing is recognized correctly. I'm using a dictionary file of about 300 words at present. What are the areas I should look for to improve accuracy? Up to now I haven't done any tweaking on this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
语音识别应用程序的设计要求您了解语音识别背后的一些基本概念,例如声学模型、语法和语音词典。您可以通过 CMUSphinx 教程了解更多信息 http://cmusphinx.sourceforge.net/wiki/tutorial
准确性差是语音应用程序开发的正常状态,您可以使用一个过程来改进它并使应用程序有用。过程如下:
收集您尝试识别的语音样本并创建语音数据库,以衡量当前的准确性并了解其背后的问题
尝试调整词汇量以改善不同词汇之间的分离
语音提示。例如,10 个命令的词汇比 300 个命令的词汇更容易识别。
设计您的应用程序时,要识别的变体数量较少,并且
人们的答案很简单。这项活动称为 VUI(语音用户界面设计),它是一个相当大的领域,有许多精彩的书籍和博客文章。您可以在此处找到一些详细信息: http://www.amazon.com/Voice-Interface -Design-Michael-Cohen/dp/0321185765
尝试改进应用程序的声学部分。修改词典以匹配您的演讲。调整声学模型以匹配声学特性。有关声学的描述,请参阅 http://cmusphinx.sourceforge.net/wiki/tutorialadapt模型适应过程。
The design of speech recognition applications requires you to understand some basic concepts behind speech recognition such as an acoustic model, grammar, and the phonetic dictionary. You can learn more from a CMUSphinx tutorial http://cmusphinx.sourceforge.net/wiki/tutorial
Bad accuracy is a normal state of the speech application development, there is a process which you can use to improve it and make the application useful. The process is the following:
Collect speech samples you are trying to recognize and create a speech database to measure the current accuracy and understand the issues behind it
Try to play with the vocabulary size in order to improve the separation between different
voice prompts. For example the vocabulary of 10 commands is way easier to recognize than the vocabulary of 300 commands.
Design your application the way that the number of variants to recognize is less and the
answers of people are straightforward. This activity is called VUI (voice user interface design) and it's quite a big area with many brilliant books and blog articles. You can find some details here: http://www.amazon.com/Voice-Interface-Design-Michael-Cohen/dp/0321185765
Try to improve the acoustic part of your application. Modify the dictionary to match your speech. Adapt the acoustic model to match the acoustic properties. See http://cmusphinx.sourceforge.net/wiki/tutorialadapt for the description of the acoustic model adaptation process.