python 语音通信求助！

发布于 2024-09-05 01:53:52 字数 1127 浏览 15 评论 0原文

我目前正在尝试用 python 编写一个语音聊天程序。欢迎使用所有提示/技巧来执行此操作。

到目前为止，我发现 pyAudio 是 PortAudio 的包装。所以我尝试了一下，从麦克风获取输入流，然后播放到扬声器。当然只有RAW。

但我无法通过网络发送原始数据（由于大小），所以我正在寻找一种对其进行编码的方法。我在网上搜索并偶然发现了这个 speex -python 的包装器。这似乎是真的，相信我，确实如此。

您可以在 pyAudio 中看到，您可以设置要从输入音频缓冲区中获取的块的大小，在链接上的示例代码中，它设置为 320。然后，当它被编码时，每个块大约有 40 字节的数据，我想这是相当可以接受的。现在来解决问题。

我启动一个示例程序，它只接受输入流，对块进行编码，解码并播放它们（由于测试而不通过网络发送）。如果我只是让我的计算机空闲并运行这个程序，它会很好地工作，但是一旦我执行某些操作，即启动 Firefox 或其他操作，音频输入缓冲区就会全部堵塞！它只是增长，然后全部崩溃，并在缓冲区上给我一个溢出错误。

好吧，那么为什么我只获取流的 320 字节呢？我可以只取 1024 字节之类的，这样可以减轻缓冲区的压力。但。如果我给 speex 1024 字节的数据进行编码/解码，它要么崩溃，要么说这对于它的缓冲区来说太大了。或者它对其进行编码/解码，但声音非常嘈杂且“断断续续”，就好像它只编码了 1024 个块的一小部分，其余的是静态噪声。所以声音听起来像直升机，哈哈。

我做了一些研究，似乎 speex 一次只能转换 320 字节的数据，而宽带则可以转换 640 字节。但这就是标准吗？我该如何解决这个问题？我应该如何构建我的程序来使用 speex？我可以使用中间缓冲区，从缓冲区读取所有可用数据，然后将其分成 320 位并对其进行编码/解码。但这需要更长的时间，而且似乎是一个非常糟糕的问题解决方案。

因为据我所知，没有其他Python编码器可以对音频进行编码，因此可以通过网络以可接受的小包发送，或者？我已经用谷歌搜索了三天了。

还有这个pyMedia库，不知道这种软件转成mp3/ogg好不好。

预先感谢您阅读本文，希望任何人都可以帮助我！（：

原文

I'm currently trying to write a voicechat program in python. All tips/trick is welcome to do this.

So far I found pyAudio to be a wrapper of PortAudio. So I played around with that and got an input stream from my microphone to be played back to my speakers. Only RAW of course.

But I can't send RAW-data over the netowrk (due the size duh), so I'm looking for a way to encode it. And I searched around the 'net and stumbled over this speex-wrapper for python. It seems to good to be true, and believe me, it was.

You see in pyAudio you can set the size of the chunks you want to take from your input audiobuffer, and in that sample code on the link, it's set to 320. Then when it's encoded, its like ~40 bytes of data per chunk, which is fairly acceptable I guess. And now for the problem.

I start a sample program which just takes the input stream, encodes the chunks, decodes them and play them (not sending over the network due testing). If I just let my computer idle and run this program it works great, but as soon as I do something, i.e start Firefox or something, the audio input buffer gets all clogged up! It just grows and then it all crashes and gives me an overflow error on the buffer..

OK, so why am I just taking 320 bytes of the stream? I could just take like 1024 bytes or something and that will easy the pressure on the buffer. BUT. If I give speex 1024 bytes of data to encode/decode, it either crashes and says that thats too big for its buffer. OR it encodes/decodes it, but the sound is very noisy and "choppy" as if it only encoded a tiny bit of that 1024 chunk and the rest is static noise. So the sound sounds like a helicopter, lol.

I did some research and it seems that speex only can convert 320 bytes of data at time, and well, 640 for wide-band. But that's the standard?
How can I fix this problem? How should I construct my program to work with speex? I could use a middle-buffer tho that takes all available data to read from the buffer, then chunk this up in 320 bits and encode/decode them. But this takes a bit longer time and seems like a very bad solution of the problem..

Because as far as I know, there's no other encoder for python that encodes the audio so it can be sent over the network in acceptable small packages, or? I've been googling for three days now.

Also there is this pyMedia library, I don't know if its good to convert to mp3/ogg for this kind of software.

Thank in in advance for reading this, hope anyone can help me! (:

分享到QQ

分享到微博