python 语音通信求助!
我目前正在尝试用 python 编写一个语音聊天程序。欢迎使用所有提示/技巧来执行此操作。
到目前为止,我发现 pyAudio 是 PortAudio 的包装。所以我尝试了一下,从麦克风获取输入流,然后播放到扬声器。当然只有RAW。
但我无法通过网络发送原始数据(由于大小),所以我正在寻找一种对其进行编码的方法。我在网上搜索并偶然发现了 这个 speex -python 的包装器。这似乎是真的,相信我,确实如此。
您可以在 pyAudio 中看到,您可以设置要从输入音频缓冲区中获取的块的大小,在链接上的示例代码中,它设置为 320。然后,当它被编码时,每个块大约有 40 字节的数据,我想这是相当可以接受的。现在来解决问题。
我启动一个示例程序,它只接受输入流,对块进行编码,解码并播放它们(由于测试而不通过网络发送)。如果我只是让我的计算机空闲并运行这个程序,它会很好地工作,但是一旦我执行某些操作,即启动 Firefox 或其他操作,音频输入缓冲区就会全部堵塞!它只是增长,然后全部崩溃,并在缓冲区上给我一个溢出错误。
好吧,那么为什么我只获取流的 320 字节呢?我可以只取 1024 字节之类的,这样可以减轻缓冲区的压力。但。如果我给 speex 1024 字节的数据进行编码/解码,它要么崩溃,要么说这对于它的缓冲区来说太大了。或者它对其进行编码/解码,但声音非常嘈杂且“断断续续”,就好像它只编码了 1024 个块的一小部分,其余的是静态噪声。所以声音听起来像直升机,哈哈。
我做了一些研究,似乎 speex 一次只能转换 320 字节的数据,而宽带则可以转换 640 字节。但这就是标准吗? 我该如何解决这个问题?我应该如何构建我的程序来使用 speex?我可以使用中间缓冲区,从缓冲区读取所有可用数据,然后将其分成 320 位并对其进行编码/解码。但这需要更长的时间,而且似乎是一个非常糟糕的问题解决方案。
因为据我所知,没有其他Python编码器可以对音频进行编码,因此可以通过网络以可接受的小包发送,或者?我已经用谷歌搜索了三天了。
还有这个pyMedia库,不知道这种软件转成mp3/ogg好不好。
预先感谢您阅读本文,希望任何人都可以帮助我! (:
I'm currently trying to write a voicechat program in python. All tips/trick is welcome to do this.
So far I found pyAudio to be a wrapper of PortAudio. So I played around with that and got an input stream from my microphone to be played back to my speakers. Only RAW of course.
But I can't send RAW-data over the netowrk (due the size duh), so I'm looking for a way to encode it. And I searched around the 'net and stumbled over this speex-wrapper for python. It seems to good to be true, and believe me, it was.
You see in pyAudio you can set the size of the chunks you want to take from your input audiobuffer, and in that sample code on the link, it's set to 320. Then when it's encoded, its like ~40 bytes of data per chunk, which is fairly acceptable I guess. And now for the problem.
I start a sample program which just takes the input stream, encodes the chunks, decodes them and play them (not sending over the network due testing). If I just let my computer idle and run this program it works great, but as soon as I do something, i.e start Firefox or something, the audio input buffer gets all clogged up! It just grows and then it all crashes and gives me an overflow error on the buffer..
OK, so why am I just taking 320 bytes of the stream? I could just take like 1024 bytes or something and that will easy the pressure on the buffer. BUT. If I give speex 1024 bytes of data to encode/decode, it either crashes and says that thats too big for its buffer. OR it encodes/decodes it, but the sound is very noisy and "choppy" as if it only encoded a tiny bit of that 1024 chunk and the rest is static noise. So the sound sounds like a helicopter, lol.
I did some research and it seems that speex only can convert 320 bytes of data at time, and well, 640 for wide-band. But that's the standard?
How can I fix this problem? How should I construct my program to work with speex? I could use a middle-buffer tho that takes all available data to read from the buffer, then chunk this up in 320 bits and encode/decode them. But this takes a bit longer time and seems like a very bad solution of the problem..
Because as far as I know, there's no other encoder for python that encodes the audio so it can be sent over the network in acceptable small packages, or? I've been googling for three days now.
Also there is this pyMedia library, I don't know if its good to convert to mp3/ogg for this kind of software.
Thank in in advance for reading this, hope anyone can help me! (:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以尝试霍夫曼编码,这是一个非常简洁的概念。我不知道你能做到多快,但我确定你是否创建了自己的 C /C++ 模块,您可以将其设为快得多。
当然,可能已经有一些模块可以完全满足您的需要 - 我只是从未使用过它们,所以我完全不知道它们的存在。
You could try Huffman encoding, it's a pretty neat concept. I don't know how fast you could make it, but I'm sure if you created your own C/C++ module you could make it a lot faster.
Of course, there may be already some modules out there that do exactly what you need - I've just never used them, so I'm completely unaware of their existence.