编写语音识别引擎

发布于 2024-12-17 11:08:20 字数 184 浏览 0 评论 0原文

因此,像许多其他人一样,我决定创建自己的语音识别引擎。事实证明,这根本不容易,相反,尤其是对于英语来说,这是相当困难的,因为我想说,一个单词的书写方式和发音方式之间存在巨大的差异。来自格鲁吉亚,我决定为格鲁吉亚语编写语音识别。在格鲁吉亚语中,单词的发音与书写的方式完全一样。这就像转录一样。这一事实会显着减轻我的任务吗?或者还有更困难的...困难:D ?

So, like many others I decided to create my own speech-recognition engine. As it turned out, it's not easy at all, instead, it's rather difficult to accomplish for English language particularly, because there is, I'd say, dramatical difference between the way a word is written, and the way it's pronounced. Being from Georgia, I decided to write speech-recognition for Georgian language. In Georgian, you pronounce words EXACTLY the way you write them. It's just like a transcription. Will this fact significantly ease my task? Or there are even more difficult... difficulties :D ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

心在旅行 2024-12-24 11:08:20

语音识别是一个复杂的领域,具有许多特定的算法、工具和方法。要创建您自己的引擎,您可以从 CMUSphinx 开源语音识别工具包开始,该工具包允许您:

  • 收集和处理支持格鲁吉亚语所需的数据
  • 创建格鲁吉亚语模型
  • 实施格鲁吉亚语语音识别引擎。
  • 使用引擎创建在桌面、服务器或 iPhone(通过 OpenEars)上运行的语音识别应用程序

CMUSphinx 已经支持英语、德语、西班牙语、法语、荷兰语、俄语、普通话、冰岛语、意大利语和许多其他语言。添加新的非常简单。对于新人来说,通常需要一两个月的集中工作才能实施所需的流程。

要开始使用,请访问主页:

http://cmusphinx.sourceforge.net

并阅读教程

http://cmusphinx.sourceforge.net/wiki/tutorial

如果您有任何问题,请在论坛或此处提问!

而且,一个非常常见的误解是,当您说格鲁吉亚语时,只需拼写发音即可。对于世界上大多数语言来说,情况并非如此。为了测试这个假设,尝试在音频编辑器中录制一些音频,并检查哪些声音是实际发音的。你会感到惊讶的。上面的教程详细介绍了这个问题。

Speech recognition is a complex domain with many specific algorithms, tools and methods. To create your own engine you could start with CMUSphinx open source speech recognition toolkit which will allow you to:

  • Collect and process data required to support Georgian language
  • Create the models for Georgian
  • Implement a speech recognition engine in Georgian.
  • Use engine to create a speech recognition application running on desktop, on server or on IPhone (through OpenEars)

CMUSphinx already supports English, German, Spanish, French, Dutch, Russian, Mandarin, Icelandic, Italian and many other languages. It's very simple to add a new one. For new people it usually takes a month or two of concentrated work to implement the required process.

To get started visit the homepage:

http://cmusphinx.sourceforge.net

and read the tutorial

http://cmusphinx.sourceforge.net/wiki/tutorial

If you have any question, please ask them on forums or here!

And, it's a very common misconception that you just spell the sounds when you speak Georgian. It's not true for most of the languages in the world. To test the hypothesis try to record some audio in an audio editor and check which sounds are actually pronounced. You'll be surprised. Tutorial above covers this question in details.

梦太阳 2024-12-24 11:08:20

所有来自乔治亚州的人听起来都一样吗?我认为不是……语音识别中的许多主要问题与语言本身没有直接关系:

  • 不同的人(女人、男人、儿童、老人等)有不同的声音
  • ,有时同一个人听起来不同,例如当一个人寒冷的
  • 不同背景噪音
  • 日常讲话有时包含其他语言的单词(例如美国/英语中的德语单词 Kindergarden)
  • 有些人不是来自该国本身学习的语言(他们通常听起来不同)
  • 有些人说得更快,其他人说得更快 较慢
  • 麦克风质量
    等等。

解决这些问题总是相当困难...最重要的是你有语言/发音需要照顾...我不懂格鲁吉亚语,但你所描述的可能会让任务变得更容易一些,但它仍然会是一项艰巨的任务。

编辑 - 根据评论:

使用好的库可能会缩短时间,甚至有助于提高质量...但并非每个库都适合语音识别,尽管可能在其他一些与音频相关的问题上表现出色

...参考请参阅维基百科文章 http://en.wikipedia.org/wiki/Speech_recognition - 它有一个好的概述,包括一些链接和书籍参考,这是一个很好的起点...

至于如何设计这样的 API,请参见示例 http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Recognition.html

Do all people from Georgia sound absolutely the same ? I think not... lots of major problems in speech recognition are not directly related to the language itself:

  • different people (women, men, children, elders etc.) have different voices
  • sometimes the same person sounds different for example when the person has a cold
  • different background noises
  • everyday speech sometimes contains words from other languages (like you have the german word Kindergarden in the US/English)
  • some persons not from the country itself learned the language (they usually sound different)
  • some persons speak faster, others speak slower
  • quality of the microphone
    etc.

Solving these things always is pretty hard... on top of that you have the language/pronounciation to take care of... I don't know Georgian but what you describe might make the task a bit easier but it will still be a hard task.

EDIT - as per comments:

Using good libraries might lower the time-frame and even help in quality... but not every library is good for speech recognition despite perhaps being brilliant on some other audio-related matters...

For reference see the Wikipedia article http://en.wikipedia.org/wiki/Speech_recognition - it has a good overview including some links and book references which are a good starting point...

As for how to design such an API see for example http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Recognition.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文