语音识别API
我需要自动转录一些短 MP3,作为我正在进行的概念验证的一部分。我目前正在研究云解决方案或 Web API 服务,以将 MP3 作为简单的 HTTP 请求发送并接收回转录。
我在此处找到了唯一的免费/开源解决方案,但演示似乎不起作用(至少不在我需要转录的文件上)。我已经找到了一些呼叫中心的企业解决方案,但到目前为止,我还没有可以简单地将其集成到项目中。
有可用的基于网络的语音识别服务吗?能够过滤掉小噪音的将是一个优势。
I need to automatically transcribe some short MP3s as part of a proof of concept I am working on. I am currently looking into cloud solutions or web API services to send the MP3 as a simple HTTP request and receive a transcription back.
The only free/open source solution I have found here, but the demos don't seem to work (at least not on the files I need to transcribe). I have found some enterprise solutions for call centers, but so far nothing I can simply integrate into a project.
Are there any web based speech recognition services available? One that is able to filter out small noise would be a plus.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
以下是访问 Google ASR 功能的非官方方法 。我昨天刚刚进行了测试,它仍然有效 - 您可以从以 16KHz 采样的 FLC 音频中获取包含单词和相关置信度分数的 JSON 样式 ASR 输出。
Here is an unofficial method to access Google ASR capability. I just tested on Yesterday and it still works - you can get JSON style ASR output with words and associated confidence score from an FLC audio sampled in 16KHz.
您也可以尝试使用Windows 7的语音识别引擎来生成字幕。 这里就是用于此目的的工具。
Also you can try speech recognition engine of Windows 7 to produce subtitles. Here is the tool for that.
这可能是一个不错的匹配。此外,他们的 techcrunch 简介(查看此)列出的竞争对手为:SimulScribe、SpinVox、Vlingo、Nuance 、微软、谷歌
其中一些链接可能会有所帮助。
Vlingo、Bing 和 Google 在云端都有识别器,但我不认为他们会让它们公开编程。我相信只有经过授权的客户才能访问它们。
对于概念验证(和小批量),您是否考虑过仅使用 Windows 7 中的桌面语音引擎? System.Speech 之间有什么区别.Recognition 和 Microsoft.Speech.Recognition? 可能会有所帮助。 MS 桌面识别器附带听写语法,听起来这就是您所需要的。
This may be a good match. Also, their techcrunch profile (See this) lists competitors as: SimulScribe, SpinVox, Vlingo, Nuance, Microsoft, Google
Some of these links may be helpful.
Vlingo, Bing and Google have recognizers in the cloud, but I don't think they make them publicly programmable. I believe they are accessible only from their authorized clients.
For a proof of concept (and low volume), have you considered just using the desktop speech engines that come in Windows 7? What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition? may be helpful. The MS desktop recognizers ship with a dictation grammar and it sounds like that is what you will need.