以自动方式转录 WMA/MP3 音频?
我有很多 WMA 格式的语音音频,我想用机器转录它 - 即使转录不是 100% 准确,我认为它作为某些音频的“索引”会有很大帮助。我愿意编写一些代码来实现这一点,但是 Microsoft 的语音 API 可以帮助我吗?是否已经有一个应用程序可以为我执行此操作?
I’ve got a lot of speech audio in WMA format and I’d like to machine transcribe it – even if the transcription is not 100% accurate, I think it could help quite a bit as an “index” to some of the audio. I’m willing to write some code to make this happen, but can Microsoft’s Speech APIs help me here? Is there already an app that can do this for me?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
SAPI当然可以做你想做的事。从进程内识别器开始,将音频作为文件流连接(您可能需要将 WMA 文件转码为 WAV 流,因为 SAPI 只接受 WAV 输入,但您可以即时进行转码),设置听写模式,然后就可以开始了。
现在令人失望的是。你可能不会得到非常好的结果;事实上,我怀疑除非你非常幸运,否则你可能得到的都是垃圾。
存在几个问题:
我实际上建议使用 Dragon Naturallyspoken Professional;他们花费了时间和金钱来进行转录。我自己没有使用过,所以我不知道它在你的情况下效果如何。
SAPI can certainly do what you want. Start with an in-proc recognizer, connect up your audio as a file stream (you'll probably need to transcode your WMA files to a WAV stream, as SAPI only takes WAV input, but you can do the transcoding on the fly), set dictation mode, and off you go.
Now the disappointing bit. You probably won't get terribly good results; in fact, I suspect that unless you're very lucky, you'll probably get total garbage.
There are several problems:
I actually would suggest using Dragon Naturally Speaking Professional; they've spent the time and money to make transcription work. I haven't used it myself, so I don't know how well it would work in your situation.
您需要一个相应的程序来实现此目的,例如听写软件。 Speech API 则相反。我也不相信有任何开源的东西可以做到这一点,因为这是一个非常非常复杂的软件。
You would need an according program to achieve this, like a dictating software. The Speech API is the other way around. I don't believe there is something opensource for this either, as this is a very, very complicated piece of software.