获取 C++ 中捕获的音频信号的幅度(或有效值电压);由 wavin lib 提供。?

发布于 2024-10-29 00:57:24 字数 302 浏览 6 评论 0原文

我正在研究一个非常基本的机器人项目,并希望在其中实现语音识别。 我知道这是一件复杂的事情,但我希望只用 3 或 4 个命令(或单词)来完成它。

我知道使用 wavin 我可以录制音频。但我希望对音频信号进行实时幅度分析,该怎么做,波形将以 8 位、单声道输入。

我想过将信号分成一组特定时间,进一步将其分成更小的子集,获取子集的平均均方根值,然后将它们相加,然后看看它们与实际存储的信号有多少不同。错误低于所有(或大多数)集合的可接受值,然后打印该单词。

如何实施? 如果您还可以向我提供任何其他建议,那就太好了。

提前致谢。

I am working on a very basic robotics project, and wish to implement voice recognition in it.
i know its a complex thing but i wish to do it for only 3 or 4 commands(or words).

i know that using wavin i can record audio. but i wish to do real-time amplitude analysis on the audio signal, how can that be done, the wave will be inputed as 8-bit, mono.

i have thought of divinding the signal into a set of some specific time, further diving it into smaller subsets, getting the average rms value over the subset and then summing them up and then see how much different they are from the actual stored signal.If the error is below accepted value for all(or most) of the sets, then print the word.

How can this be implemented?
if you can provide me any other suggestion also, it would be great.

Thanks, in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

一花一树开 2024-11-05 00:57:25

没有简单的方法来识别单词,因为它们基本上是一系列随时间和频率变化的音素。

经典的孤立词识别系统使用信号MFCC(倒谱系数)作为输入数据,并尝试使用 HMM(隐马尔可夫模型)或 DTW(动态时间规整)算法识别模式。

如果您不需要录音按钮,您还需要一个静音检测模块。

例如,爱丁堡大学工具包提供了其中一些工具(带有良好的文档)。

如果您不想“从头开始”构建它或没有灵感来源,这里是这样一个系统(使用自己的工具包)的(旧但免费)实现,带有关于其工作原理的完整解释和实际示例

该系统是 LVCSR(大词汇连续语音识别),您只需要它的一个子集。如果有人知道一个开源的减少词汇量的系统(比如简单的 IVR),那将是受欢迎的。

如果您想自己制作一个基本系统,我建议您使用 MFCC 和 DTW:

  • 对于要建模的每个目标单词:
    • 记录该词的一些实例
    • 通过单词计算一些(例如每10ms)delta-MFCC以获得模型
  • 当你想要识别信号时:
    • 计算该信号的一些 delta-MFCC
    • 使用 DTW 将这些 delta-MFCC 与每个模型化单词的 delta-MFCC 进行比较
    • 输出最适合的单词(使用阈值来丢弃垃圾)

There is no simple way to recognize words, because they are basically a sequence of phonemes which can vary in time and frequency.

Classical isolated word recognition systems use signal MFCC (cepstral coefficients) as input data, and try to recognize patterns using HMM (hidden markov models) or DTW (dynamic time warping) algorithms.

You will also need a silence detection module if you don't want a record button.

For instance Edimburgh University toolkit provides some of these tools (with good documentation).

If you don't want to build it "from scratch" or have a source of inspiration, here is an (old but free) implementation of such a system (which uses its own toolkit) with a full explanation and practical examples on how it works.

This system is a LVCSR (Large-Vocabulary Continuous Speech Recognition) and you only need a subset of it. If someone know an open source reduced vocabulary system (like a simple IVR) it would be welcome.

If you want to make a basic system from your own, I recommend you to use MFCC and DTW:

  • For each target word to modelize:
    • record some instances of the word
    • compute some (eg each 10ms) delta-MFCC through the word to have a model
  • When you want to recognize a signal:
    • compute some delta-MFCC of this signal
    • use DTW to compare these delta-MFCC to each modelized word's delta-MFCC
    • output the word that fits the best (use a threshold to drop garbage)
世态炎凉 2024-11-05 00:57:25

如果您只想识别一些命令,可以使用许多商业和免费产品。请参阅 需要适用于 Linux 的文本转语音和语音识别工具< /a> 或 有什么区别System.Speech.Recognition 和 Microsoft.Speech.Recognition?iPhone 上的语音识别。这些问题的答案与许多可用的产品和工具相关。语音识别和命令列表的理解是商业解决的一个非常常见的问题。您拨打的许多语音自动电话系统都使用这种类型的技术。开发人员可以使用相同的技术。

通过几个月来观察这些问题,我发现大多数开发人员的选择都是这样的:

当然这也可能有帮助 - http://en.wikipedia.org/wiki/List_of_speech_recognition_software

If you just want to recognize a few commands, there are many commercial and free products you can use. See Need text to speech and speech recognition tools for Linux or What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? or Speech Recognition on iPhone. The answers to these questions link to many available products and tools. Speech recognition and understanding of a list of commands is a very common problem solved commercially. Many of the voice automated phone systems you call uses this type of technology. The same technology is available for developers.

From watching these questions for few months, I've seen most developer choices break down like this:

Of course this may also be helpful - http://en.wikipedia.org/wiki/List_of_speech_recognition_software

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文