使用 google Speech-to-text API 获取所有转录结果

发布于 2025-01-10 13:35:48 字数 974 浏览 0 评论 0原文

我想知道是否有可能获得谷歌可以从给定的音频文件生成的所有可能的转录本,正如你所看到的,它只给出具有更高匹配结果的转录本。

from google.cloud import speech
import os
import io

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = ''


# Creates google client
client = speech.SpeechClient()

# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"test2.wav")

#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
    content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    audio_channel_count=1,
    language_code="en-gb"    
)

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})

print(response.results)

# Reads the response
for result in response.results:
    print("Transcript: {}".format(result.alternatives[0].transcript))

I would like to know if it is possible to get all the possible transcripts that google can generate from a given audio file, as you can see it is only giving the transcript that has the higher matching result.

from google.cloud import speech
import os
import io

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = ''


# Creates google client
client = speech.SpeechClient()

# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"test2.wav")

#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
    content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    audio_channel_count=1,
    language_code="en-gb"    
)

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})

print(response.results)

# Reads the response
for result in response.results:
    print("Transcript: {}".format(result.alternatives[0].transcript))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

笑着哭最痛 2025-01-17 13:35:48

在您的 RecognitionConfig(),将值设置为 max_alternatives。当设置大于 1 时,它将显示其他可能的转录。

max_alternatives int

要返回的识别假设的最大数量。具体来说,
SpeechRecognitionAlternative 消息的最大数量
每个SpeechRecognitionResult。服务器返回的可能少于
max_alternatives。有效值为 0-30。值为 0
1 将返回最多 1。如果省略,将返回
最多 1 个。

将您的 RecognitionConfig() 更新为以下代码:

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    audio_channel_count=1,
    language_code="en-gb",
    max_alternatives=10 # place a value between 0 - 30
)

我使用 语音 API 的 github 存储库。我使用下面的代码进行测试:

from google.cloud import speech
import os
import io

# Creates google client
client = speech.SpeechClient()

# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"audio.raw")

#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
    content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    audio_channel_count=1,
    language_code="en-us",
    max_alternatives=10 # used 10 for testing
)

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})

for result in response.results:
    print(result.alternatives)

输出:

在此处输入图像描述

On your RecognitionConfig(), set a value to max_alternatives. When this is set greater than 1, it will show the other possible transcriptions.

max_alternatives int

Maximum number of recognition hypotheses to be returned. Specifically,
the maximum number of SpeechRecognitionAlternative messages within
each SpeechRecognitionResult. The server may return fewer than
max_alternatives. Valid values are 0-30. A value of 0
or 1 will return a maximum of one. If omitted, will return a
maximum of one.

Update your RecognitionConfig() to the code below:

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    audio_channel_count=1,
    language_code="en-gb",
    max_alternatives=10 # place a value between 0 - 30
)

I tested this using the sample audio from the github repo of Speech API. I used code below for testing:

from google.cloud import speech
import os
import io

# Creates google client
client = speech.SpeechClient()

# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"audio.raw")

#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
    content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    audio_channel_count=1,
    language_code="en-us",
    max_alternatives=10 # used 10 for testing
)

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})

for result in response.results:
    print(result.alternatives)

Output:

enter image description here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文