获取 C++ 中捕获的音频信号的幅度(或有效值电压);由 wavin lib 提供。?
我正在研究一个非常基本的机器人项目,并希望在其中实现语音识别。 我知道这是一件复杂的事情,但我希望只用 3 或 4 个命令(或单词)来完成它。
我知道使用 wavin 我可以录制音频。但我希望对音频信号进行实时幅度分析,该怎么做,波形将以 8 位、单声道输入。
我想过将信号分成一组特定时间,进一步将其分成更小的子集,获取子集的平均均方根值,然后将它们相加,然后看看它们与实际存储的信号有多少不同。错误低于所有(或大多数)集合的可接受值,然后打印该单词。
如何实施? 如果您还可以向我提供任何其他建议,那就太好了。
提前致谢。
I am working on a very basic robotics project, and wish to implement voice recognition in it.
i know its a complex thing but i wish to do it for only 3 or 4 commands(or words).
i know that using wavin i can record audio. but i wish to do real-time amplitude analysis on the audio signal, how can that be done, the wave will be inputed as 8-bit, mono.
i have thought of divinding the signal into a set of some specific time, further diving it into smaller subsets, getting the average rms value over the subset and then summing them up and then see how much different they are from the actual stored signal.If the error is below accepted value for all(or most) of the sets, then print the word.
How can this be implemented?
if you can provide me any other suggestion also, it would be great.
Thanks, in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
没有简单的方法来识别单词,因为它们基本上是一系列随时间和频率变化的音素。
经典的孤立词识别系统使用信号MFCC(倒谱系数)作为输入数据,并尝试使用 HMM(隐马尔可夫模型)或 DTW(动态时间规整)算法识别模式。
如果您不需要录音按钮,您还需要一个静音检测模块。
例如,爱丁堡大学工具包提供了其中一些工具(带有良好的文档)。
如果您不想“从头开始”构建它或没有灵感来源,这里是这样一个系统(使用自己的工具包)的(旧但免费)实现,带有关于其工作原理的完整解释和实际示例。
该系统是 LVCSR(大词汇连续语音识别),您只需要它的一个子集。如果有人知道一个开源的减少词汇量的系统(比如简单的 IVR),那将是受欢迎的。
如果您想自己制作一个基本系统,我建议您使用 MFCC 和 DTW:
There is no simple way to recognize words, because they are basically a sequence of phonemes which can vary in time and frequency.
Classical isolated word recognition systems use signal MFCC (cepstral coefficients) as input data, and try to recognize patterns using HMM (hidden markov models) or DTW (dynamic time warping) algorithms.
You will also need a silence detection module if you don't want a record button.
For instance Edimburgh University toolkit provides some of these tools (with good documentation).
If you don't want to build it "from scratch" or have a source of inspiration, here is an (old but free) implementation of such a system (which uses its own toolkit) with a full explanation and practical examples on how it works.
This system is a LVCSR (Large-Vocabulary Continuous Speech Recognition) and you only need a subset of it. If someone know an open source reduced vocabulary system (like a simple IVR) it would be welcome.
If you want to make a basic system from your own, I recommend you to use MFCC and DTW:
如果您只想识别一些命令,可以使用许多商业和免费产品。请参阅 需要适用于 Linux 的文本转语音和语音识别工具< /a> 或 有什么区别System.Speech.Recognition 和 Microsoft.Speech.Recognition? 或 iPhone 上的语音识别。这些问题的答案与许多可用的产品和工具相关。语音识别和命令列表的理解是商业解决的一个非常常见的问题。您拨打的许多语音自动电话系统都使用这种类型的技术。开发人员可以使用相同的技术。
通过几个月来观察这些问题,我发现大多数开发人员的选择都是这样的:
Windows 人员 - 使用 .Net 或 Microsoft.Speech 的 System.Speech 功能并安装 Microsoft 提供的免费识别器。 Windows 7 包含完整的语音引擎。其他可免费下载。同一引擎有一个 C++ API,称为 SAPI。请参阅 http://msdn.microsoft.com/en-us/magazine/ cc163663.aspx。或 http://msdn.microsoft.com/ en-us/library/ms723627(v=vs.85).aspx
Linux 人员 - Sphinx 似乎有很多追随者。请参阅 http://cmusphinx.sourceforge.net/ 和 http://cmusphinx.sourceforge.net/wiki/
商业产品 - Nuance、Loquendo、AT&T 等
在线服务 - Nuance、Yapme 等
当然这也可能有帮助 - http://en.wikipedia.org/wiki/List_of_speech_recognition_software
If you just want to recognize a few commands, there are many commercial and free products you can use. See Need text to speech and speech recognition tools for Linux or What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? or Speech Recognition on iPhone. The answers to these questions link to many available products and tools. Speech recognition and understanding of a list of commands is a very common problem solved commercially. Many of the voice automated phone systems you call uses this type of technology. The same technology is available for developers.
From watching these questions for few months, I've seen most developer choices break down like this:
Windows folks - use the System.Speech features of .Net or Microsoft.Speech and install the free recognizers Microsoft provides. Windows 7 includes a full speech engine. Others are downloadable for free. There is a C++ API to the same engines known as SAPI. See at http://msdn.microsoft.com/en-us/magazine/cc163663.aspx. or http://msdn.microsoft.com/en-us/library/ms723627(v=vs.85).aspx
Linux folks - Sphinx seems to have a good following. See http://cmusphinx.sourceforge.net/ and http://cmusphinx.sourceforge.net/wiki/
Commercial products - Nuance, Loquendo, AT&T, others
Online service - Nuance, Yapme, others
Of course this may also be helpful - http://en.wikipedia.org/wiki/List_of_speech_recognition_software