如何用C++实现语音识别和文本转语音?
我想了解进行语音识别和文本到语音转换的各种技术。 另请让我了解有关其的任何资源,例如链接、教程、电子书等。
哪种技术是实现这一目标最有效的技术?
I want to know about various techniques to do speech recognition and text to speech conversion.
Also please let me know about any resources like links, tutorials ,ebooks etc. on it.
Which is the most efficient technique to achieve it ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我将回答有关语音识别的部分(因为我对文本转语音不太了解):
http://ecx.images-amazon.com/images/I/4190SZC61CL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg
这本书, 《语音识别的统计方法》是解释统计语音识别的数学基础的经典著作,由该领域的创始人 Frederick Jelinek 撰写。
您必须了解的最重要的概念是隐马尔可夫模型。几十年来,人们一直在语音识别中使用它们。最近的方法使用条件随机字段,请参阅论文 (PDF) 和相关软件工具包 围巾。
编写自己的语音识别器相当困难。这是一个活跃的研究领域,举办了多个科学会议,例如 ASRU、Interspeech,ICASSP。
I'm going to answer the part about speech recognition (since I don't know much about text-to-speech):
http://ecx.images-amazon.com/images/I/4190SZC61CL._BO2,204,203,200_PIsitb-sticker-arrow-click,TopRight,35,-76_AA240_SH20_OU01_.jpg
This book, "Statistical Methods for Speech Recognition" is a classic that explains the mathematical foundations of statistical speech recognition, written by the founder of that area, Frederick Jelinek.
The most important concept you have to know is Hidden Markov Models. People have been using them in speech recognition for decades. A recent approach uses Conditional Random Fields, see the paper (PDF) and the associated software toolkit SCARF.
It is fairly hard to write your own speech recognizer. It's an active research area with several scientific conferences, e.g. ASRU, Interspeech, ICASSP.
两者都是非常广阔的领域。
关于识别:在此此架构中,您将了解如何构建基本的自动语音识别系统。它无论如何都还没有接近艺术的起点,但它是可以实现的,并且它有效。如果您想做更高级的事情,请阅读倒谱系数和隐马尔可夫模型。查看HTK,它是一个广泛使用的隐马尔可夫模型工具包。
关于文本到语音:我会看看Festival。
Both are very wide areas.
About recognition: In this this schema you will find how to build a basic automatic speech recognition system. It isn't by any means close to the start of the art, but it is something achievable and it works. If you want to do something more advanced, read about cepstral coefficients and Hidden Markov Models. Have a look into HTK, it is a widely used toolkit for Hidden Markov Models.
About text to speech: I'd have a look at Festival.
有多个狮身人面像。主要活跃的是pocketsphinx和sphinx4。
Sphinx4 是用 Java 编写的。它更适合桌面和 Web 应用程序。
Pocketsphinx是用C编写的。对于嵌入式设备来说更好。有 iphone/android 应用程序使用它。
听起来你想要口袋狮身人面像。尝试一下这个教程:
http://www.speech.cs.cmu.edu/sphinx/tutorial。 html
询问 pocketsphinx/sphinx4 问题的更好地方是 CMU 的 sourceforge 论坛。
您还应该提供更多信息,例如您打算制作的内容。
至于书籍,语音识别的圣经是《口语处理》
There are multiple sphinx's. The main active ones are pocketsphinx and sphinx4.
Sphinx4 is written in Java. It is better for desktop and web applications.
Pocketsphinx is written in C. It is better for embedded devices. There are iphone/android apps that use it.
Sounds like you want pocketsphinx. Try out this tutorial:
http://www.speech.cs.cmu.edu/sphinx/tutorial.html
A better place to ask pocketsphinx/sphinx4 questions is on CMU's sourceforge forum.
Also you should provide more info like what you intend to make.
As for books, the bible of speech recognition is "Spoken Language Processing"
既然您提到了 MS -
您应该只查看 Microsoft Speech 网站。它包含许多用于处理语音的资源,包括 TTS 和语音识别。
Since you mentioned MS -
You should just look at the Microsoft Speech site. It contains many resources for dealing with speech, including TTS and speech recognition.
如果您正在寻找一些实际代码,请查看 Sphinx,这是一个来自 CMU 的开源语音识别项目。它不是用 C++ 编写的,但如果您对算法感兴趣,它实现了很多您可以学习的东西。 (我也想回应 @dehmann 的观点:阅读隐马尔可夫模型。)
If you're looking for some actual code, check out Sphinx, an open source speech recognition project from CMU. It's not written in C++, but if you're interested in algorithms, it's implemented a bunch of stuff you can learn from. (I'd like to echo @dehmann's point, too: read up on hidden markov models.)
如果您对如何使用您的奇特语音识别感到好奇,您应该阅读:
兰迪·艾伦·哈里斯 (Randy Allen Harris) 的语音交互设计
它提供了一些关于何时使用语音以及如何在应用程序中使用它的很好的建议。
If you are curious about what to do with your fancy speech recognition you should read:
Voice Interaction Design by Randy Allen Harris
It provides some great advice about when to use Voice and how to use it in an application.