使用 google Speech-to-text API 获取所有转录结果

发布于 2025-01-10 13:35:48 字数 974 浏览 0 评论 0原文

我想知道是否有可能获得谷歌可以从给定的音频文件生成的所有可能的转录本，正如你所看到的，它只给出具有更高匹配结果的转录本。

from google.cloud import speech
import os
import io

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = ''


# Creates google client
client = speech.SpeechClient()

# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"test2.wav")

#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
    content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    audio_channel_count=1,
    language_code="en-gb"    
)

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})

print(response.results)

# Reads the response
for result in response.results:
    print("Transcript: {}".format(result.alternatives[0].transcript))

原文

I would like to know if it is possible to get all the possible transcripts that google can generate from a given audio file, as you can see it is only giving the transcript that has the higher matching result.

from google.cloud import speech
import os
import io

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = ''


# Creates google client
client = speech.SpeechClient()

# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"test2.wav")

#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
    content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    audio_channel_count=1,
    language_code="en-gb"    
)

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})

print(response.results)

# Reads the response
for result in response.results:
    print("Transcript: {}".format(result.alternatives[0].transcript))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笑着哭最痛 2025-01-17 13:35:48

在您的 RecognitionConfig()，将值设置为 max_alternatives。当设置大于 1 时，它将显示其他可能的转录。

max_alternatives int
要返回的识别假设的最大数量。具体来说，
内 SpeechRecognitionAlternative 消息的最大数量
每个SpeechRecognitionResult。服务器返回的可能少于
max_alternatives。有效值为 0-30。值为 0
或 1 将返回最多 1。如果省略，将返回
最多 1 个。

将您的 RecognitionConfig() 更新为以下代码：

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    audio_channel_count=1,
    language_code="en-gb",
    max_alternatives=10 # place a value between 0 - 30
)

我使用语音 API 的 github 存储库。我使用下面的代码进行测试：

from google.cloud import speech
import os
import io

# Creates google client
client = speech.SpeechClient()

# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"audio.raw")

#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
    content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    audio_channel_count=1,
    language_code="en-us",
    max_alternatives=10 # used 10 for testing
)

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})

for result in response.results:
    print(result.alternatives)

输出：

On your RecognitionConfig(), set a value to max_alternatives. When this is set greater than 1, it will show the other possible transcriptions.

max_alternatives int
Maximum number of recognition hypotheses to be returned. Specifically,
the maximum number of SpeechRecognitionAlternative messages within
each SpeechRecognitionResult. The server may return fewer than
max_alternatives. Valid values are 0-30. A value of 0
or 1 will return a maximum of one. If omitted, will return a
maximum of one.

Update your RecognitionConfig() to the code below:

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    audio_channel_count=1,
    language_code="en-gb",
    max_alternatives=10 # place a value between 0 - 30
)

I tested this using the sample audio from the github repo of Speech API. I used code below for testing:

from google.cloud import speech
import os
import io

# Creates google client
client = speech.SpeechClient()

# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"audio.raw")

#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
    content = audio_file.read()
    audio = speech.RecognitionAudio(content=content)

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=16000,
    audio_channel_count=1,
    language_code="en-us",
    max_alternatives=10 # used 10 for testing
)

# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})

for result in response.results:
    print(result.alternatives)

Output: