Android 上的语音识别与录制的声音剪辑?
我在 Android 上使用过语音识别功能,我很喜欢它。这是我的客户最受好评的功能之一。然而,格式有些限制。您必须调用识别器意图,让它将转录记录发送到谷歌,然后等待文本返回。
我的一些想法需要在我的应用程序中录制音频,然后将剪辑发送到谷歌进行转录。
有什么方法可以发送音频剪辑以进行语音转文本处理吗?
I've used the voice recognition feature on Android and I love it. It's one of my customers' most praised features. However, the format is somewhat restrictive. You have to call the recognizer intent, have it send the recording for transcription to google, and wait for the text back.
Some of my ideas would require recording the audio within my app and then sending the clip to google for transcription.
Is there any way I can send an audio clip to be processed with speech to text?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我得到了一个运行良好的解决方案,可以进行语音识别和录音。这是我创建的一个简单 Android 项目的链接,用于展示该解决方案的工作原理。另外,我在项目中放置了一些打印屏幕来说明该应用程序。
我将尝试简要解释我使用的方法。我在该项目中结合了两个功能:Google Speech API 和 Flac 录音。
Google Speech API 通过 HTTP 连接调用。 Mike Pultz 提供有关 API 的更多详细信息:
“(...)新的 [Google] API 是一个全双工流 API。这意味着它实际上使用两个 HTTP 连接 - 一个 POST 请求将内容作为“实时”分块流上传,并且第二个 GET 请求来访问结果,这对于较长的音频样本或流音频更有意义。”
但是,此 API 需要接收 FLAC 声音文件才能正常工作。这让我们进入第二部分:Flac 录音
我通过从名为 AudioBoo 的开源应用程序中提取和改编一些代码和库,在该项目中实现了 Flac 录音。 AudioBoo 使用本机代码来录制和播放 flac 格式。
因此,可以录制 flac 声音,将其发送到 Google Speech API,获取文本,然后播放刚刚录制的声音。
我创建的项目具有使其发挥作用的基本原则,并且可以针对特定情况进行改进。为了使其在不同的场景中工作,需要获得 Google Speech API 密钥,该密钥是通过加入 Google Chromium-dev 小组获得的。我在该项目中留下了一把钥匙只是为了表明它正在工作,但我最终会删除它。如果有人需要更多相关信息,请告诉我,因为我无法在这篇文章中放置超过 2 个链接。
I got a solution that is working well to have speech recognizing and audio recording. Here is the link to a simple Android project I created to show the solution's working. Also, I put some print screens inside the project to illustrate the app.
I'm gonna try to explain briefly the approach I used. I combined two features in that project: Google Speech API and Flac recording.
Google Speech API is called through HTTP connections. Mike Pultz gives more details about the API:
"(...) the new [Google] API is a full-duplex streaming API. What this means, is that it actually uses two HTTP connections- one POST request to upload the content as a “live” chunked stream, and a second GET request to access the results, which makes much more sense for longer audio samples, or for streaming audio."
However, this API needs to receive a FLAC sound file to work properly. That makes us to go to the second part: Flac recording
I implemented Flac recording in that project through extracting and adapting some pieces of code and libraries from an open source app called AudioBoo. AudioBoo uses native code to record and play flac format.
Thus, it's possible to record a flac sound, send it to Google Speech API, get the text, and play the sound that was just recorded.
The project I created has the basic principles to make it work and can be improved for specific situations. In order to make it work in a different scenario, it's necessary to get a Google Speech API key, which is obtained by being part of Google Chromium-dev group. I left one key in that project just to show it's working, but I'll remove it eventually. If someone needs more information about it, let me know cause I'm not able to put more than 2 links in this post.
不幸的是现在不是。 Android 语音识别服务当前支持的唯一接口是
RecognizerIntent
,它不允许您提供自己的声音数据。如果您希望看到此内容,请在 http://b.android.com。这也与现有的问题 4541 和问题 36915103。
Unfortunately not at this time. The only interface currently supported by Android's voice recognition service is the
RecognizerIntent
, which doesn't allow you to provide your own sound data.If this is something you'd like to see, file a feature request at http://b.android.com. This is also tangentially related to existing issue 4541 and issue 36915103.
据我所知,目前还没有办法直接将音频片段发送到谷歌进行转录。但是,Froyo(API 级别 8)引入了 SpeechRecognizer 类,该类提供直接访问语音识别服务。因此,例如,您可以开始播放音频剪辑,并让您的 Activity 启动在后台侦听的语音识别器,这将在完成后将结果返回给用户定义的侦听器回调方法。
由于 SpeechRecognizer 的方法必须在主应用程序线程中运行,因此应在 Activity 中定义以下示例代码。此外,您还需要将 RECORD_AUDIO 权限添加到您的 AndroidManifest 中。 xml。
您还可以通过扩展 RecognitionService 来定义自己的语音识别服务,但是超出了这个答案的范围:)
As far as I know there is still no way to directly send an audio clip to Google for transcription. However, Froyo (API level 8) introduced the SpeechRecognizer class, which provides direct access to the speech recognition service. So, for example, you can start playback of an audio clip and have your Activity start the speech recognizer listening in the background, which will return results after completion to a user-defined listener callback method.
The following sample code should be defined within an Activity since SpeechRecognizer's methods must be run in the main application thread. Also you will need to add the RECORD_AUDIO permission to your AndroidManifest.xml.
You can also define your own speech recognition service by extending RecognitionService, but that is beyond the scope of this answer :)