java语音识别Sphinx 4

发布于 2024-08-18 11:48:53 字数 250 浏览 7 评论 0原文

我想使用 sphinx4 或 HTK 工具包来构建一个语音识别应用程序,旨在通过声音估计一个人的年龄。我在更大程度上理解了语音识别中涉及的统计模型。 我对梅尔频率倒谱系数和高斯混合模型感兴趣,因为这两个模型更适合我的问题域。我是否必须使用神经网络并从狮身人面像分类器派生的向量中输入训练数据?我不太确定从哪里开始使用 sphinx 或 HTK 工具包。 我是狮身人面像和语音识别的新手,我的应用程序只是一个原型。

任何人都可以在这方面提供某种形式的指导吗? 亲切的问候。

I want to use either sphinx4 or the HTK toolkit to build me a speech recognition application that aims to estimate ones age from voice. I understand, to a greater extent, the ststistical models involved in speech recognition.
I am interested in Mel frequency cepstral coefficients and Gausian mixture models because these two are better suited to my problem domain. Do I have to use neural networks and feed in the training data from the vectors derived from the sphinx classifiers ? I am not quite sure where to start with sphinx or the HTK toolkit.
I am new to sphinx and speech recognition and my application is only a prototype.

Can anyone please offer some form of guidance in this regard.
Kind regards.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

葬花如无物 2024-08-25 11:48:53

通常,像这样的事情首先要从学术界寻找相关的工作。 Minematsu 等人2002 年,他们使用梅尔频率倒谱系数的高斯混合模型 (GMM) 来区分年长和年轻的说话者。

想必,如果您能够访问年老和年轻演讲者的训练数据,您应该能够做到同样的事情。即使您想尝试其他分类器后端(例如神经网络),从 GMM 开始可能会很好,因为您知道它们应该适合您的任务,并且它们会给您一些与其他分类器进行比较的东西你想尝试使用。

如果您只是为了好玩或作为一个研究项目而这样做,我建议使用 HTK,因为我喜欢它的模块化程度。然而,如果这是出于商业目的,您可能应该选择 Sphinx,因为它可以在类似 BSD 的许可证下重新分发。

Usually, the first place to start for something like this is to look for prior related work from the academic community. In Minematsu et al. 2002, they used Gaussian mixture models (GMMs) over mel-frequency cepstral coefficients to distinguish between old and young speakers.

Presumably, if you have access to training data with both old and young speakers, you should be able to do the same. Even if you'd like to try another classifier back-end such as neural networks, it would probably be good to start with GMMs since you know that they should work for your task and they'll give you something to compare with whatever other classifiers you'd like to try to use.

If you're just doing this for fun or as a research project, I would recommend using HTK, since I like how modular it is. However, if this is being down for something commerical, you should probably go with Sphinx, since it can be redistributed under a BSD like license.

梦里兽 2024-08-25 11:48:53

我决定不选择 Sphinx 4,因为它基于隐马尔可夫模型,该模型主要用于序列分析,例如语音识别,甚至基于输入序列的界面的多模态输入。我使用了一个名为 Praat 的软件,它用于语音处理和合成。如果您愿意的话,还有一个“插件”,称为“Akustyk”,用于分析元音等。我不确定这个方向可能对你有价值。

然后,您可以使用 mathlab 并使用模式识别工具箱来实现您的神经网络、GMM 或您希望采用的任何方法。

希望有帮助。

I decided not to go with Sphinx 4 because its based on Hidden Markov models which is primarily used for sequencial analysis auch as speech recognition and even multimodal inputs to an interface based on the input sequence. Insted I went with a software called Praat, its for speech processing and synthesis. There is also a "plugin" if you like, called "Akustyk" which is used to analyse vowels and so on. May be that direction will be of value for you, i'm not sure.

You can then use mathlab and use the pattern recognition toolbox to implement your neural networks, GMM, or whatever approach you wish to pursue.

Hope it was helpful.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文