使用开放耳朵进行语音识别时准确度非常低

发布于 2024-12-05 06:27:37 字数 141 浏览 3 评论 0原文

我在我的应用程序中使用张开耳朵进行语音识别。主要关注的是准确性。在安静的环境中,准确率约为 50%,但在嘈杂的环境中,情况会变得更糟。几乎没有什么能被正确识别。我目前使用的字典文件大约有300个单词。我应该注意哪些方面来提高准确性?到目前为止我还没有对此进行任何调整。

I'm using open ears for speech recognition in my app. The major concern is the accuracy. In a quiet environment there is about 50% accuracy, but things get worse in a noisy environment. Almost nothing is recognized correctly. I'm using a dictionary file of about 300 words at present. What are the areas I should look for to improve accuracy? Up to now I haven't done any tweaking on this.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

深府石板幽径 2024-12-12 06:27:37

语音识别应用程序的设计要求您了解语音识别背后的一些基本概念,例如声学模型、语法和语音词典。您可以通过 CMUSphinx 教程了解更多信息 http://cmusphinx.sourceforge.net/wiki/tutorial

准确性差是语音应用程序开发的正常状态,您可以使用一个过程来改进它并使应用程序有用。过程如下:

  1. 收集您尝试识别的语音样本并创建语音数据库,以衡量当前的准确性并了解其背后的问题

  2. 尝试调整词汇量以改善不同词汇之间的分离
    语音提示。例如,10 个命令的词汇比 300 个命令的词汇更容易识别。

  3. 设计您的应用程序时,要识别的变体数量较少,并且
    人们的答案很简单。这项活动称为 VUI(语音用户界面设计),它是一个相当大的领域,有许多精彩的书籍和博客文章。您可以在此处找到一些详细信息: http://www.amazon.com/Voice-Interface -Design-Michael-Cohen/dp/0321185765

  4. 尝试改进应用程序的声学部分。修改词典以匹配您的演讲。调整声学模型以匹配声学特性。有关声学的描述,请参阅 http://cmusphinx.sourceforge.net/wiki/tutorialadapt模型适应过程。

The design of speech recognition applications requires you to understand some basic concepts behind speech recognition such as an acoustic model, grammar, and the phonetic dictionary. You can learn more from a CMUSphinx tutorial http://cmusphinx.sourceforge.net/wiki/tutorial

Bad accuracy is a normal state of the speech application development, there is a process which you can use to improve it and make the application useful. The process is the following:

  1. Collect speech samples you are trying to recognize and create a speech database to measure the current accuracy and understand the issues behind it

  2. Try to play with the vocabulary size in order to improve the separation between different
    voice prompts. For example the vocabulary of 10 commands is way easier to recognize than the vocabulary of 300 commands.

  3. Design your application the way that the number of variants to recognize is less and the
    answers of people are straightforward. This activity is called VUI (voice user interface design) and it's quite a big area with many brilliant books and blog articles. You can find some details here: http://www.amazon.com/Voice-Interface-Design-Michael-Cohen/dp/0321185765

  4. Try to improve the acoustic part of your application. Modify the dictionary to match your speech. Adapt the acoustic model to match the acoustic properties. See http://cmusphinx.sourceforge.net/wiki/tutorialadapt for the description of the acoustic model adaptation process.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文