文本转语音 Google Cloud Python

发布于 2025-01-15 13:44:03 字数 1741 浏览 2 评论 0原文

我想计算在 Google Cloud 中使用 Python 将文本转换为语音时句子的持续时间。例如,如果我将三个句子转换为音频,我想知道第一个句子何时在音频中开始,第二个句子等。

示例:

text= 'Hello, World 。我会说任何语言。我很想帮助你。'

你好,世界:开始 00:00 结束 00:03

我会说任何语言:开始 00:04 结束 00:09 strong>

我想帮助你:开始 00:10 结束 00:13

python 中有这样的东西吗?这是主要代码:

"""Synthesizes speech from the input string of text or ssml.

Note: ssml must be well-formed according to:
    https://www.w3.org/TR/speech-synthesis/
"""
from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World. I can speak any language. I would like to help you.")

# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
    language_code="en-US", ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL
)

texttospeech_v1beta1.types.cloud_tts_pb2

# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3
)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
    input_=synthesis_input, voice=voice, audio_config=audio_config
)

# The response's audio_content is binary.
with open("./output.mp3", "wb") as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

I would like to calculate the time duration for sentences when I convert text to speech in Google Cloud in Python. For example, if I have three sentences converted to audio, I would like to know when the first sentence starts in the audio, the second one, etc.

Example:

text= 'Hello, World. I can speak any language. I would like to help you.'

Hello, World: starts 00:00 ends 00:03

I can speak any language: starts 00:04 ends 00:09

I would like to help you: starts 00:10 ends 00:13

Is there something for that in python? here is the main code:

"""Synthesizes speech from the input string of text or ssml.

Note: ssml must be well-formed according to:
    https://www.w3.org/TR/speech-synthesis/
"""
from google.cloud import texttospeech

# Instantiates a client
client = texttospeech.TextToSpeechClient()

# Set the text input to be synthesized
synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World. I can speak any language. I would like to help you.")

# Build the voice request, select the language code ("en-US") and the ssml
# voice gender ("neutral")
voice = texttospeech.types.VoiceSelectionParams(
    language_code="en-US", ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL
)

texttospeech_v1beta1.types.cloud_tts_pb2

# Select the type of audio file you want returned
audio_config = texttospeech.types.AudioConfig(
    audio_encoding=texttospeech.enums.AudioEncoding.MP3
)

# Perform the text-to-speech request on the text input with the selected
# voice parameters and audio file type
response = client.synthesize_speech(
    input_=synthesis_input, voice=voice, audio_config=audio_config
)

# The response's audio_content is binary.
with open("./output.mp3", "wb") as out:
    # Write the response to the output file.
    out.write(response.audio_content)
    print('Audio content written to file "output.mp3"')

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文