Azure语音对文本REST API V3二进制数据

发布于 2025-01-27 11:37:34 字数 1537 浏览 5 评论 0 原文

我正在尝试使用Azure语音进行文本服务。在我面临的文档中,使用 v1 api版本: https://jregion.stt.spech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1

以及基本上的每个链接均适用于 v3 api 。

https:// {endpoint}/speechtotext/v3.0

在此 v1 示例中,您可以轻松地将文件发送为 binary

curl --location --request POST \
"https://$region.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US" \
--header "Ocp-Apim-Subscription-Key: $key" \
--header "Content-Type: audio/wav" \
--data-binary $audio_file

但是我无法弄清楚如何提供 WordLevelTimestampSenabled = true 用于获取单词级别时间戳的参数。

另一方面,我尝试使用 v3 API,并且可以轻松地提供 WordLevelTimestampSenabled = true 参数,但是我无法弄清楚如何发送二进制文件数据。

curl -L -X POST 'https://northeurope.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions' -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'Ocp-Apim-Subscription-Key: $key' --data-raw '{
  "contentUrls": [
    "https://url-to-file.dev/test-file.wav"
  ],
  "properties": {
    "diarizationEnabled": false,
    "wordLevelTimestampsEnabled": true,
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked"
  },
  "locale": "pl-PL",
  "displayName": "Transcription using default model for pl-PL"
}'

有没有办法传递二进制文件,还可以使用 WordLevelTimestAmpSenabled = true 参数获取Word Level Timestamps?

I'm trying to use Azure Speech to text service. In the documentation I'm confronted with examples, that use V1 API version:
https://$region.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1

And basically every link to proper documentation is for the V3 API.

https://{endpoint}/speechtotext/v3.0

In this V1 example you can easily send your file as binary.

curl --location --request POST \
"https://$region.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-US" \
--header "Ocp-Apim-Subscription-Key: $key" \
--header "Content-Type: audio/wav" \
--data-binary $audio_file

But I could not figure it out how to provide an wordLevelTimestampsEnabled=true parameter for getting word level timestamps.

On the other hand, I tried using the V3 API, and I can easily provide wordLevelTimestampsEnabled=true parameter, but I couldn't figure out how to send binary file data.

curl -L -X POST 'https://northeurope.api.cognitive.microsoft.com/speechtotext/v3.0/transcriptions' -H 'Content-Type: application/json' -H 'Accept: application/json' -H 'Ocp-Apim-Subscription-Key: $key' --data-raw '{
  "contentUrls": [
    "https://url-to-file.dev/test-file.wav"
  ],
  "properties": {
    "diarizationEnabled": false,
    "wordLevelTimestampsEnabled": true,
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked"
  },
  "locale": "pl-PL",
  "displayName": "Transcription using default model for pl-PL"
}'

Is there a way to pass a binary file and also get word level timestamps with wordLevelTimestampsEnabled=true parameter?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

椒妓 2025-02-03 11:37:34

有没有一种方法可以传递二进制文件,还可以使用 WordLevelTimestampSenabled = true 参数获取Word Level时间戳?

代码不同,将评论转换为社区Wiki答案,以帮助可能面临类似问题的社区成员, 。

按照,无法直接上传二进制文件。您应该通过 contenturls 属性提供URL。

例如:

{
  "contentUrls": [
    "<URL to an audio file to transcribe>",
  ],
  "properties": {
    "diarizationEnabled": false,
    "wordLevelTimestampsEnabled": true,
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked"
  },
  "locale": "en-US",
  "displayName": "Transcription of file using default model for en-US"
}

您可以参考 speeck-to-to-toxt rest api v3.0 cognitive-services-speech-sdk azure语音识别 - 使用二进制 /十六进制数据,而不是wav文件路径

Is there a way to pass a binary file and also get word level timestamps with wordLevelTimestampsEnabled=true parameter?

As suggested by Code Different, converting a comment as a community wiki answer to help community members who might face a similar issue.

As per the documentation, binary file can't be uploaded directly. You should provide URL via contentUrls property.

For example:

{
  "contentUrls": [
    "<URL to an audio file to transcribe>",
  ],
  "properties": {
    "diarizationEnabled": false,
    "wordLevelTimestampsEnabled": true,
    "punctuationMode": "DictatedAndAutomatic",
    "profanityFilterMode": "Masked"
  },
  "locale": "en-US",
  "displayName": "Transcription of file using default model for en-US"
}

You can refer to Speech-to-text REST API v3.0, cognitive-services-speech-sdk and Azure Speech Recognition - use binary / hexadecimal data instead of WAV file path

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文