编写语音识别引擎
因此,像许多其他人一样,我决定创建自己的语音识别引擎。事实证明,这根本不容易,相反,尤其是对于英语来说,这是相当困难的,因为我想说,一个单词的书写方式和发音方式之间存在巨大的差异。来自格鲁吉亚,我决定为格鲁吉亚语编写语音识别。在格鲁吉亚语中,单词的发音与书写的方式完全一样。这就像转录一样。这一事实会显着减轻我的任务吗?或者还有更困难的...困难:D ?
So, like many others I decided to create my own speech-recognition engine. As it turned out, it's not easy at all, instead, it's rather difficult to accomplish for English language particularly, because there is, I'd say, dramatical difference between the way a word is written, and the way it's pronounced. Being from Georgia, I decided to write speech-recognition for Georgian language. In Georgian, you pronounce words EXACTLY the way you write them. It's just like a transcription. Will this fact significantly ease my task? Or there are even more difficult... difficulties :D ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
语音识别是一个复杂的领域,具有许多特定的算法、工具和方法。要创建您自己的引擎,您可以从 CMUSphinx 开源语音识别工具包开始,该工具包允许您:
CMUSphinx 已经支持英语、德语、西班牙语、法语、荷兰语、俄语、普通话、冰岛语、意大利语和许多其他语言。添加新的非常简单。对于新人来说,通常需要一两个月的集中工作才能实施所需的流程。
要开始使用,请访问主页:
http://cmusphinx.sourceforge.net
并阅读教程
http://cmusphinx.sourceforge.net/wiki/tutorial
如果您有任何问题,请在论坛或此处提问!
而且,一个非常常见的误解是,当您说格鲁吉亚语时,只需拼写发音即可。对于世界上大多数语言来说,情况并非如此。为了测试这个假设,尝试在音频编辑器中录制一些音频,并检查哪些声音是实际发音的。你会感到惊讶的。上面的教程详细介绍了这个问题。
Speech recognition is a complex domain with many specific algorithms, tools and methods. To create your own engine you could start with CMUSphinx open source speech recognition toolkit which will allow you to:
CMUSphinx already supports English, German, Spanish, French, Dutch, Russian, Mandarin, Icelandic, Italian and many other languages. It's very simple to add a new one. For new people it usually takes a month or two of concentrated work to implement the required process.
To get started visit the homepage:
http://cmusphinx.sourceforge.net
and read the tutorial
http://cmusphinx.sourceforge.net/wiki/tutorial
If you have any question, please ask them on forums or here!
And, it's a very common misconception that you just spell the sounds when you speak Georgian. It's not true for most of the languages in the world. To test the hypothesis try to record some audio in an audio editor and check which sounds are actually pronounced. You'll be surprised. Tutorial above covers this question in details.
所有来自乔治亚州的人听起来都一样吗?我认为不是……语音识别中的许多主要问题与语言本身没有直接关系:
等等。
解决这些问题总是相当困难...最重要的是你有语言/发音需要照顾...我不懂格鲁吉亚语,但你所描述的可能会让任务变得更容易一些,但它仍然会是一项艰巨的任务。
编辑 - 根据评论:
使用好的库可能会缩短时间,甚至有助于提高质量...但并非每个库都适合语音识别,尽管可能在其他一些与音频相关的问题上表现出色
...参考请参阅维基百科文章 http://en.wikipedia.org/wiki/Speech_recognition - 它有一个好的概述,包括一些链接和书籍参考,这是一个很好的起点...
至于如何设计这样的 API,请参见示例 http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Recognition.html
Do all people from Georgia sound absolutely the same ? I think not... lots of major problems in speech recognition are not directly related to the language itself:
etc.
Solving these things always is pretty hard... on top of that you have the language/pronounciation to take care of... I don't know Georgian but what you describe might make the task a bit easier but it will still be a hard task.
EDIT - as per comments:
Using good libraries might lower the time-frame and even help in quality... but not every library is good for speech recognition despite perhaps being brilliant on some other audio-related matters...
For reference see the Wikipedia article http://en.wikipedia.org/wiki/Speech_recognition - it has a good overview including some links and book references which are a good starting point...
As for how to design such an API see for example http://java.sun.com/products/java-media/speech/forDevelopers/jsapi-guide/Recognition.html