以自动方式转录 WMA/MP3 音频?

发布于 2024-08-07 03:56:14 字数 137 浏览 3 评论 0原文

我有很多 WMA 格式的语音音频,我想用机器转录它 - 即使转录不是 100% 准确,我认为它作为某些音频的“索引”会有很大帮助。我愿意编写一些代码来实现这一点,但是 Microsoft 的语音 API 可以帮助我吗?是否已经有一个应用程序可以为我执行此操作?

I’ve got a lot of speech audio in WMA format and I’d like to machine transcribe it – even if the transcription is not 100% accurate, I think it could help quite a bit as an “index” to some of the audio. I’m willing to write some code to make this happen, but can Microsoft’s Speech APIs help me here? Is there already an app that can do this for me?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

生来就爱笑 2024-08-14 03:56:14

SAPI当然可以做你想做的事。从进程内识别器开始,将音频作为文件流连接(您可能需要将 WMA 文件转码为 WAV 流,因为 SAPI 只接受 WAV 输入,但您可以即时进行转码),设置听写模式,然后就可以开始了。

现在令人失望的是。你可能不会得到非常好的结果;事实上,我怀疑除非你非常幸运,否则你可能得到的都是垃圾。

存在几个问题:

  1. 听写只有在 SR 引擎经过训练后才能正常工作。如果你很幸运(像我一样),你可以获得不错的结果,但如果说话者有口音,培训是必须的。
  2. 训练只对单一声音有效。如果单个音频文件中有多个扬声器,则效果不会很好。
  3. 听写(以及一般的语音识别)的音频模型假设您使用近距离麦克风(即,麦克风紧邻您的脸,以最大程度地减少噪音拾取)。如果您的 WMA 文件有额外的噪音,准确性会急剧下降。

我实际上建议使用 Dragon Naturallyspoken Professional;他们花费了时间和金钱来进行转录。我自己没有使用过,所以我不知道它在你的情况下效果如何。

SAPI can certainly do what you want. Start with an in-proc recognizer, connect up your audio as a file stream (you'll probably need to transcode your WMA files to a WAV stream, as SAPI only takes WAV input, but you can do the transcoding on the fly), set dictation mode, and off you go.

Now the disappointing bit. You probably won't get terribly good results; in fact, I suspect that unless you're very lucky, you'll probably get total garbage.

There are several problems:

  1. Dictation really only works well once the SR engine has been trained. If you're lucky (like me), you can get OK results, but if the speaker has an accent, training is a must.
  2. Training only works well for a single voice. If you've got multiple speakers in a single audio file, it's not going to work well.
  3. The audio model for dictation (and Speech Recognition in general) assumes that you're using a close-talk microphone (i.e., a microphone right next to your face, to minimize noise pickup). If your WMA files have extra noise, accuracy will go down dramatically.

I actually would suggest using Dragon Naturally Speaking Professional; they've spent the time and money to make transcription work. I haven't used it myself, so I don't know how well it would work in your situation.

习ぎ惯性依靠 2024-08-14 03:56:14

您需要一个相应的程序来实现此目的,例如听写软件。 Speech API 则相反。我也不相信有任何开源的东西可以做到这一点,因为这是一个非常非常复杂的软件。

You would need an according program to achieve this, like a dictating software. The Speech API is the other way around. I don't believe there is something opensource for this either, as this is a very, very complicated piece of software.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文