我想将语音录音与已知文本同步。是否有一个语音到文本/自然语言处理库可以促进这一点?我想我想要检测单词边界并计算字典中的候选匹配项。我在 SO 上发现的大多数问题都与书面语言有关。
想要的,但不是必需的:
- 开源
- 与美式英语兼容 开箱即用 跨平台
- 完整
- 记录
编辑:我意识到这是一个非常广泛,甚至天真的问题,所以提前感谢您的指导。
到目前为止我发现了什么:
I would like to synchronize a spoken recording against a known text. Is there a speech-to-text / natural language processing library that would facilitate this? I imagine I'd want to detect word boundaries and compute candidate matches from a dictionary. Most of the questions I've found on SO concern written language.
Desired, but not required:
- Open Source
- Compatible with American English out-of-the-box
- Cross-platform
- Thoroughly documented
Edit: I realize this is a very broad, even naive, question, so thanks in advance for your guidance.
What I've found so far:
发布评论
评论(1)
强制对齐
听起来你想做音频和已知文本之间强制对齐。
几乎所有研究/工业级语音识别系统都能够做到这一点,因为强制对齐是在没有 电话 音频和文字记录之间的级别对齐。
对齐 CMUSphinx
CMU 开源语音识别系统的 Sphinx4-1.0 beta 5 版本现在包含一个有关如何在文字记录和长语音录音之间进行对齐的演示。
Forced Alignment
It sounds like you want to do forced alignment between your audio and the known text.
Pretty much all research/industry grade speech recognition systems will be able to do this, since forced alignment is an important part of training a recognition system on data that doesn't have phone level alignments between the audio and the transcript.
Alignment CMUSphinx
The Sphinx4-1.0 beta 5 release of CMU's open source speech recognition system now includes a demo on how to do alignment between a transcript and long speech recordings.