FFMPEG Seeking 带来音频伪影

发布于 2024-12-13 12:56:18 字数 780 浏览 8 评论 0原文

我正在使用 ffmpeg 实现音频解码器。 在读取音频甚至搜索已经可以工作时,我无法找到一种在搜索后清除缓冲区的方法,因此当应用程序在搜索后立即开始读取音频时,我没有任何工件。

avcodec_flush_buffers 似乎对内部缓冲区没有任何影响。所有解码器(mp3、aac、wma 等)都会出现此问题,但 PCM/WAV(由于音频未压缩,因此不使用内部缓冲区来保存要解码的数据)。

代码片段很简单:

av_seek_frame(audioFilePack->avContext, audioFilePack->stream, posInTimeFrame, AVSEEK_FLAG_ANY);
avcodec_flush_buffers(audioFilePack->avContext->streams[audioFilePack->stream]->codec);

解释:

audioFilePack->avContext = FormatContext
audioFilePack->stream = Stream Position (also used to read audio packets)
audioFilePack->avContext->streams[audioFilePack->stream]->codec = CodecContext for the codec used

关于我应该做什么以便我可以寻找并获得没有残留音频的任何想法? 谢谢!

I'm implementing a audio decoder using ffmpeg.
While reading audio and even seeking already works, I can't figure out a way to clear the buffers after seeking so I have no artifacts when the app starts reading audio right after seeking.

avcodec_flush_buffers doesn´t seem to have any effect on the internal buffers. This issue happens with all decoders (mp3, aac, wma, ...) but PCM/WAV (which doesn´t use internal buffers to hold data to decode since the audio is not compressed).

The code snippet is simple:

av_seek_frame(audioFilePack->avContext, audioFilePack->stream, posInTimeFrame, AVSEEK_FLAG_ANY);
avcodec_flush_buffers(audioFilePack->avContext->streams[audioFilePack->stream]->codec);

Explaining:

audioFilePack->avContext = FormatContext
audioFilePack->stream = Stream Position (also used to read audio packets)
audioFilePack->avContext->streams[audioFilePack->stream]->codec = CodecContext for the codec used

Any ideas on what I should do so I can seek and get no residual audio?
Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

我偏爱纯白色 2024-12-20 12:56:18

这是 ffmpeg 中的一个错误。内部缓冲区没有被刷新,因此当您在刷新后获取数据包/帧时,您将获得预寻道数据。它似乎已于 2012 年 3 月 16 日修复,因此您可以自己合并此修复,或升级 ffmpeg。

http://permalink.gmane.org/gmane.comp.video.libav .devel/23455

作为更新,上面的错误确实是一个问题,但是 AAC 还存在第二个错误。

截至五个月前,另一位用户发现了此错误,据报道已修复。
https://ffmpeg.org/trac/ffmpeg/ticket/420

修复是刷新函数被添加到 aacdec.c 中,该函数清除其内部缓冲区。
问题是 aacdec.c 中定义了两个解码器,并且只有一个被赋予了刷新函数指针。如果您使用其他(更常见的)解码器,它仍然无法正确清除。

如果您能够自己构建 ffmpeg,解决方法是添加
.flush = 冲洗,
到 AVCodec ff_aac_decoder 定义的底部(位于文件的底部。)

我会让 ffmpeg 人员知道,希望它可以包含在主分支中。

It's a bug in ffmpeg. The internal buffers aren't being flushed, and therefore when you go to get a packet/frame after flushing, you're getting the pre-seek data. It appears to be fixed as of 3-16-12, so you could incorporate this fix yourself, or upgrade ffmpeg.

http://permalink.gmane.org/gmane.comp.video.libav.devel/23455

As an update, the bug above is indeed a problem, but there's a second bug with AAC specifically.

As of five months ago, another user found this bug, and it was reported to be fixed.
https://ffmpeg.org/trac/ffmpeg/ticket/420

The fix was a flush function being added to aacdec.c which clears its internal buffers.
The problem is there are two decoders defined in aacdec.c, and only one was given the flush function pointer. If you use the other (more common) decoder, it still won't be cleared properly.

If you're in a position to build ffmpeg yourself, the fix is to add
.flush = flush,
to the bottom of the definition of AVCodec ff_aac_decoder (which is at the bottom of the file.)

I'll let the ffmpeg guys know so hopefully it can be included in the main branch.

柒七 2024-12-20 12:56:18

我从未编写过具有搜索功能的音频播放器,但我怀疑正在发生的事情是这样的。每个音频数据包都会解码为原始声波的片段。通常,这些片段顺序地彼此邻接,结果是一个连续的波,人们听到的声音是没有伪影的音频。当您进行搜索时,您会强制文件不同部分的两个片段彼此相邻。这通常会在产生的声波中引入不连续性,耳朵将其视为咔嗒声或爆裂声,或者如您所说(我猜)是伪影。

这是一个更具体的例子。假设您在搜索之前已经播放了前 ​​25 个音频数据包。假设数据包 25 解码为最后一个样本为 12345 的波形。当数据包 25 呈现给扬声器时,您会寻找数据包 66。假设数据包 66 的第一个样本是 -23456。因此,数字音频流在整个搜索过程中从 12345 跳到 -23456。这是一个巨大的不连续性,并且会听到流行声。

我认为一种解决方案是在开始查找之前获取一个额外的数据包(在我的示例中为数据包 26),将其解码到离线缓冲区,应用淡出,然后将其放入播放队列。找到所需位置后,获取第一个数据包(在我的示例中为 66),将其解码到另一个离线缓冲区,应用淡入,然后将其放入播放队列中。这应该确保平滑的声波和无伪影的搜索。

如果您足够聪明,您可以根据需要将淡出和淡入设置为短或长。我认为只需几毫秒就足以防止伪影。您甚至可以对新旧数据包应用交叉淡入淡出。仅仅记下查找之前最后一个数据包中的最后一个样本值,并在几个样本中逐渐将其降至零,而不是立即将其拉至零,也可能就足够了。这可能比解码额外的数据包更容易。

这是我对如何解决这个问题的猜测。这显然是一个已解决的问题,因此我鼓励您也查看开源音频播放器并了解它们如何实现搜索。 Audacity、Totem、Banshee、RhythmBox、Amarok 或 VLC 等程序或 GStreamer 等框架可能是值得学习的好例子。如果您发现他们采用了值得注意的技术,请在此处报告主题。我认为人们会想了解他们是什么。祝你好运!

I've never written an audio player with seek capability, but what I suspect is going on is this. Each packet of audio decodes into a snippet of the original sound wave. Normally, these snippets sequentially abut each other and the result is a continuous wave, which one hears as audio with no artifacts. When you seek, you force two snippets from disparate parts of the file to abut each other. This generally introduces a discontinuity into the resulting sound wave, which the ear perceives as a click or pop, or as you call it (I am guessing) an artifact.

Here's a more concrete example. Let's suppose that you have played the first 25 packets of audio before you seek. Let's say packet 25 decodes into a wave whose last sample is 12345. While packet 25 is being rendered to the speaker, you seek to packet 66. Let's say packet 66's first sample is -23456. Thus the digital audio stream jumps from 12345 to -23456 across the seek. This is a huge discontinuity, and will be heard as a pop.

I think one solution is to grab one extra packet before you begin to seek (packet 26 in my example), decode it to on offline buffer, apply a fade-out, and then put it into the playback queue. After you seek to your desired location, take the first packet (66 in my eaxmple), decode it to another offline buffer, apply a fade-in, and then put that into the playback queue. This should ensure smooth sound waves and artifact-free seeking.

If you are clever, you can make the fade-out and fade-in as short or long as you want. I think only a few milliseconds ought to be enough to prevent artifacts. You could even apply a cross-fade from the old and new packets. It might also be sufficient to merely note the last sample value in the last packet before the seek, and gradually step it down to zero over a few samples, rather than pulling it to zero immediately. This might be easier than decoding an extra packet.

This is my guess about how this problem could be addressed. This is clearly a solved problem, so I encourage you to also look at open-source audio players and see how they implement seeking. Programs like Audacity, Totem, Banshee, RhythmBox, Amarok, or VLC, or frameworks like GStreamer might be good examples to learn from. If you find they employ notable techniques, please report on theme here. I think people will want to learn what they are. Good luck!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文