使用CMU的sphinx4转录非数字数据
我最近正在研究使用 CMU 的 sphinx4 进行转录并最终强制对齐,即将音频与其转录本对齐。
我发现一个名为 AutoCap 的项目基本上完成了我想要开发的任务。所以,我安装了它,但它不起作用。我尝试调整它,但我获得的只是不正确的时间戳。
所以,我想到使用 sphinx4 并自己尝试一下。我使用 Sphinx 的 Transscriber.jar 文件成功转录了一个 wav 文件。 但我无法让它适用于具有非数字数据的音频。 自述文件页面指出 “想要转录非数字数据的人应该修改 config.xml 文件以使用正确的语法、语言模型和语言学家来执行此操作”。
那么,任何人都可以为我提供一些有关以下任一方面的帮助:
- AutoCap
- 使用 Sphinx4 转录非数字数据
- 强制对齐
谢谢。
I am recently working on using CMU's sphinx4 for transcription and eventually forced alignment, i.e. aligning audio with its transcript.
I found a project called AutoCap that basically did what I wanted to develop. So, I installed it but it did not work. I tried tweaking it but all I obtained was incorrect timestamps.
So, I thought of using sphinx4 and giving it a go myself. I successfully transcribed a wav file using Sphinx's Transcriber.jar file.
But I could not get it working for an audio with non-digits data. The readme page states
'people who want to transcribe non-digits data should modify the config.xml file to use the correct grammar, language model, and linguist to do so'.
So, can anyone provide me some help on either of these :
- AutoCap
- Using Sphinx4 to transcribe non-digits data
- Forced Alignment
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
有一个专门用于语音到文本对齐的特定项目。这不是一项简单的任务。开发工作在一个单独的 sphinx4 分支中进行。您可以在这里找到一些详细信息
http://cmusphinx.sourceforge.net/?s=long +audio+alignment
如果您对此项目有任何疑问,欢迎在 sphinx4 论坛上提问
http://sourceforge.net/projects/cmusphinx/forums/forum/382337
There is a specific project dedicated to speech to text alignment. This is not a trivial task. The development goes in a separate sphinx4 branch. You can find some details here
http://cmusphinx.sourceforge.net/?s=long+audio+alignment
If you have any question on this project you are welcome to ask on sphinx4 forum
http://sourceforge.net/projects/cmusphinx/forums/forum/382337
我目前正在研究同样的问题,即转录非数字数据。我简要浏览了 sphinx 4 程序员指南文档,并按照建议使用了语言模型、声学模型和 JSGF 语法。但得到的回应并不理想。我认为仅仅调整参数或单独更改 config.xml 是不够的。我认为我们需要一个自行开发的算法来配合 sphinx 4,它可以执行更好的语音识别。从我的角度来看..我使用了 lextreeliguist、JSGFGrammar 和 trigram 语言模型。但反响并不大。也许是因为音频输入不完全是美式英语。会多做一点工作..并让你知道我的结果
I am currently working on the same issue, i.e transcribing non digit data. I have looked briefly into the sphinx 4 programmers guide documentation, and used the language models, acoustic models, and the JSGF Grammar as suggested. however the response obtained was not up to the mark. What I believe is merely tweaking the parameters or changes in the config.xml alone will not suffice. I think we would need a home grown algorithm to go along with sphinx 4 which can perform better speech recognition. From my side.. i have used the lextreeliguist, JSGFGrammar and the trigram language model. But the response was not great. perhaps because the audio input was not exactly american english. Will work on it a bit more .. and let you know my results