从 PyTTS 音频流编码 mp3

发布于 2024-08-20 09:34:17 字数 1210 浏览 7 评论 0原文

我使用 python 2.5 在音频 mp3 文件中进行文本到语音转换。

我使用 pyTSS 作为 python 文本转语音模块,来转换音频 .wav 文件中的文本(在 pyTTS 中不可能直接以 mp3 格式进行编码)。之后,我使用 lame 命令行编码器以 mp3 格式对这些 wav 文件进行编码。

现在的问题是,我想插入(特别是音频 mp3 文件的两个单词之间的点)特定的外部声音文件(如声音警告)或(如果可能的话,生成的警告声音)。

问题是:

1)我已经看到 PyTTS 可以将音频流保存在文件或内存流中。使用两个函数:

tts.SpeakToWave(file, text) 或 tts.SpeakToMemory(text)

利用 tts.SpeakToMemory(text) 函数,并使用 PyMedia 我已经能够直接保存 mp3 但 mp3 文件(再现时),听起来难以理解像唐老鸭一样! :-) 这是一段代码:

            params = {'id': acodec.getCodecID('mp3'), 'bitrate': 128000, 'sample_rate': 44100, 'ext': 'mp3', 'channels': 2}

            m = tts.SpeakToMemory(p.Text)
            soundBytes = m.GetData()

            enc = acodec.Encoder(params)

            frames = enc.encode(soundBytes)
            f = file("test.mp3", 'wb')
            for frame in frames:
                f.write(frame)
            f.close()

我不明白问题出在哪里?!? 这种可能性(如果它能正常工作),最好跳过 wav 文件转换步骤。

2)作为第二个问题,我需要将音频 mp3 文件(从文本到语音模块获得)与特定的警告声音连接起来。

显然,如果我能够在将整个音频内存流编码到唯一的 mp3 文件之前,将文本音频内存流(在文本转语音模块之后)和警告声音流连接起来,那就太好了。

我还看到 tksnack 库可以连接音频,但它们无法写入 mp3 文件。

我希望已经说清楚了。 :-)

非常感谢您对我的问题的回答。

朱利奥

I work on text-to-speech trasforming text, in audio mp3 files, using python 2.5.

I use pyTSS as a python Text-To-Speech module, to transform text in audio .wav files (in pyTTS is not possible to encode in mp3 format directly). So after that, I code these wav files, in mp3 format, using lame command line encoder.

Now, the problem is that, I would like to insert (in particular point of an audio mp3 file, between two words) a particular external sound file (like a sound warning) or (if possible a generated warning sound).

Questions are:

1) I have seen that PyTTS have possibilities to save audio stream on a file or in a memory stream. using two function:

tts.SpeakToWave(file, text) or tts.SpeakToMemory(text)

Exploiting tts.SpeakToMemory(text) function, and using PyMedia I have been able to save an mp3 directly but mp3 file (when reproducing), sounds uncomprensible like donald duck! :-)
Here a snippet of code:

            params = {'id': acodec.getCodecID('mp3'), 'bitrate': 128000, 'sample_rate': 44100, 'ext': 'mp3', 'channels': 2}

            m = tts.SpeakToMemory(p.Text)
            soundBytes = m.GetData()

            enc = acodec.Encoder(params)

            frames = enc.encode(soundBytes)
            f = file("test.mp3", 'wb')
            for frame in frames:
                f.write(frame)
            f.close()

I can not understand where is the problem?!?
This possibility (if it would work correctly), it would be good to skip wav files transformation step.

2) As second problem, I need to concatenate audio mp3 file (obtained from text-to-speech module) with a particular warning sound.

Obviously, it would be great if I could concatenate audio memory streams of text (after text-to-speech module) and the stream of a warning sound, before encoding the whole audio memory stream in an unique mp3 file.

I have seen also that tksnack libraries, can concatenate audio, but they are not able to write mp3 files.

I hope to have been clear. :-)

Many thanks to for your answers to my questions.

Giulio

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦太阳 2024-08-27 09:34:17

我认为 PyTTS 不会生成默认的 PCM 数据(即 44100 Hz、立体声、16 位)。您应该像这样检查格式:

memStream = tts.SpeakToMemory("some text")
format = memStream.Format.GetWaveFormatEx()

...并将其正确地交给 acodec。因此,您可以使用属性 format.Channelsformat.BitsPerSampleformat.SamplesPerSec

至于你的第二个问题,如果声音的格式相同,你应该能够简单地将它们全部传递给 enc.encode,一个接一个。

I don't think PyTTS produces default PCM data (i.e. 44100 Hz, stereo, 16-bit). You should check the format like this:

memStream = tts.SpeakToMemory("some text")
format = memStream.Format.GetWaveFormatEx()

...and hand it over correctly to acodec. Therefore you can use the attributes format.Channels, format.BitsPerSample and format.SamplesPerSec.

As to your second question, if the sounds are in the same format, you should be able to simply pass them all to enc.encode, one after another.

囍孤女 2024-08-27 09:34:17

在这里无法给出明确的答案,抱歉。但是有一些尝试和错误:我会查看 pymedia 模块的文档来检查是否有任何可以设置的质量配置。

另一个问题是,与波形或原始音频不同,您将无法简单地连接 mp3 编码音频:无论您达到什么解决方案,您都必须在声音未压缩(未编码)时连接/混合声音,然后生成 mp3 编码的音频。

另外,有时我们只是感觉将文件记录到磁盘并重新转换,而不是“一步”完成它是很尴尬的 - 而在 pratie 中,软件在幕后完全执行此操作,即使我们没有指定我们自己归档。如果您使用的是类 Unix 系统,您始终可以创建一个 FIFO 特殊文件(使用 mkfifo 命令)并在单独的进程中将 yoru .wav 数据发送到那里进行编码(使用 lame):对于您的程序来说,它看起来就像是使用中间文件,但实际上您不会。

can't provide a definitive answer here, sorry. But there is some trial and error: I'd look at the docuemtation of the pymedia module to check if tehre are any quality configurations that you can set.

And the other thign is that unlike wave or raw audio, you won't be able to simply concatenate mp3 encoded audio: whatever the solution you reach, you will have to concatenate/mix your sounds while they are uncompressed (unencoded), and afterwards generate the mp3 encoded audio.

Also, sometimes we just have the feeling that recordign a fiel to disk and reconvertignit, instead of doing it in "one step" is awkward - while in pratie, the software does exsactly that behind the scenes,even if we don't specify a file ourselves. If you are on a Unix-like system you can always create a FIFO special file (with the mkfifo command) and send yoru .wav data there for encodin in a separate process (using lame): for your programs it will look like you are using an intermediate file, but you actually won't.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文