speex解码出错

发布于 2024-10-04 02:07:42 字数 907 浏览 15 评论 0原文

我使用 speex 对一些音频数据进行编码并通过 UDP 发送，然后在另一端对其进行解码。我用 speex 进行了一些测试，发现如果我在编码后立即解码数据包，则解码后的数据与原始数据相差甚远。缓冲区开头的大部分字节都是 0。因此，当我解码通过 UDP 发送的音频时，我得到的只是噪音。这就是我编码音频的方式：

bool AudioEncoder::encode( float *raw, char *encoded_bits )
{
    for ( size_t i = 0; i < 256; i++ )
        this->_rfdata[i] = raw[i];
    speex_bits_reset(&this->_bits);
    speex_encode(this->_state, this->_rfdata, &this->_bits);
    int bytesWritten = speex_bits_write(&this->_bits, encoded_bits, 512);
    if (bytesWritten)
        return true;
    return false;
}

这就是我解码音频的方式：

float *f = new float[256];
// recvbuf is the buffer I pass to my recv function on the socket
speex_bits_read_from(&this->_bits, recvbuf, 512);
speex_decode(this->state, &this->_bits, f);

我已经查看了文档，并且我的大部分代码来自 speex 网站的示例编码/解码示例。我不确定我在这里缺少什么。

原文

I'm using speex to encode some audio data and send it over UDP, and decode it on the other side.
I ran a few tests with speex, and noticed that if I decode a packet straight after I encoded it, the decoded data is in no way close to the original data. Most of the bytes at the start of the buffer are 0.
So when I decode the audio sent over UDP, all I get is noise.
This is how I am encoding the audio:

bool AudioEncoder::encode( float *raw, char *encoded_bits )
{
    for ( size_t i = 0; i < 256; i++ )
        this->_rfdata[i] = raw[i];
    speex_bits_reset(&this->_bits);
    speex_encode(this->_state, this->_rfdata, &this->_bits);
    int bytesWritten = speex_bits_write(&this->_bits, encoded_bits, 512);
    if (bytesWritten)
        return true;
    return false;
}

this is how I am decoding the audio:

float *f = new float[256];
// recvbuf is the buffer I pass to my recv function on the socket
speex_bits_read_from(&this->_bits, recvbuf, 512);
speex_decode(this->state, &this->_bits, f);

I've check out the docs, and most of my code comes from the example encoding/decoding sample from the speex website.
I'm not sure what I'm missing here.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

时光病人 2024-10-11 02:07:42

我找到了编码数据如此不同的原因。事实上，正如 Paulo Scardine 所说，这是一种有损压缩，而且 speex 只能处理 160 帧，因此当从 portaudio 获取数据到 speex 时，需要通过 160 帧的“数据包”。

回复收藏 0 原文

叹沉浮 2024-10-11 02:07:42

实际上，说话会给音频数据带来额外的延迟，我通过逆向工程发现：

narrow band : delay = 200 - framesize + lookahead = 200 - 160 +  40 =  80 samples 

wide band   : delay = 400 - framesize + lookahead = 400 - 320 + 143 = 223 samples

uwide band  : delay = 800 - framesize + lookahead = 800 - 640 + 349 = 509 samples

由于前瞻是用零初始化的，因此您会观察到前几个样本“接近于零”。

为了获得正确的时序，您必须在获得已输入编解码器的实际音频数据之前跳过这些样本。为什么会这样，我不知道。 speex 的作者可能从来没有关心过这一点，因为 speex 用于流媒体，而不是主要用于存储和恢复音频数据。
另一种解决方法（为了不浪费空间）是，在输入实际音频数据之前，将（帧大小延迟）零输入编解码器，然后丢弃整个第一个 speex 帧。

我希望这能澄清一切。如果熟悉 Speex 的人读到这篇文章，如果我错了，请随时纠正我。

编辑：实际上，解码器和编码器都有先行时间。延迟的实际公式为：

narrow band : delay = decoder_lh + encoder_lh =  40 +  40 =  80 samples 

wide band   : delay = decoder_lh + encoder_lh =  80 + 143 = 223 samples

uwide band  : delay = decoder_lh + encoder_lh = 160 + 349 = 509 samples

Actually speaks introduces an additional delay to the audio data, I found out by reverse enginiering:

narrow band : delay = 200 - framesize + lookahead = 200 - 160 +  40 =  80 samples 

wide band   : delay = 400 - framesize + lookahead = 400 - 320 + 143 = 223 samples

uwide band  : delay = 800 - framesize + lookahead = 800 - 640 + 349 = 509 samples

Since the lookahead is initialized with zereos, you observe the first few samples to be "close to zero".

To get the timing right, you must skip those samples before you get the actual audio data you have feeded into the codec. Why that is, I dont know. Probalby the author of speex never cared about this since speex is for streaming, not primarily for storing and restoring audio data.
Another workaround (to not waste space) is, you feed (framesize-delay) zeroes into the codec, before feeding your actual audio data, and then dropping the entire first speex-frame.

I hope this clarifies everything. If someone familiar with Speex reads this, feel free to correct me if I am wrong.

EDIT: Actually, decoder and encoder have both a lookahead time. The actual formula for the delay is:

narrow band : delay = decoder_lh + encoder_lh =  40 +  40 =  80 samples 

wide band   : delay = decoder_lh + encoder_lh =  80 + 143 = 223 samples

uwide band  : delay = decoder_lh + encoder_lh = 160 + 349 = 509 samples

回复收藏 0 原文