上传到在线服务后从 H264 转码到 H264 会导致音频/视频同步问题

发布于 2025-01-04 14:11:39 字数 277 浏览 1 评论 0原文

我们的应用程序生成一个 MOV 文件,其中包含一系列静态图像,每个静态图像在视频中的持续时间约为半秒。该视频的帧率为 10fps,并使用 avc1 (H264) 编解码器进行编码。音频总是以大约半秒的静音开始,并使用 mp4a (MPEG-4 AAC-LC) 编解码器进行编码。上传到在线服务后,会发生 H264 转码(可能使用不同的设置),并且音频似乎比视频早半秒,即。看来开始时的沉默已从音频中删除,但视频中并未删除。我们生成的 WMV 文件也会出现这种情况。关于源视频可能出现的问题或转码中可能出现的导致此问题的问题,您有什么想法吗?

Our application generates a MOV file which contains a series of static images, each of which has a duration in the video of around half a second. The video has a frame rate of 10fps and is encoded using the avc1 (H264) codec. The audio always starts with around half a second of silence and is encoded with the mp4a (MPEG-4 AAC-LC) codec. After upload to the online service a transcode to H264 occurs (presumably with different settings) and the audio appears to be half a second ahead of the video, ie. it appears the silence at the start has been trimmed from the audio but not the video. This also occurs with WMV files we generate. Any ideas as to issues we might have with our source video or something that might occur in the transcode which would cause this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

青萝楚歌 2025-01-11 14:11:39

我不能肯定地回答这个问题——因为我不太清楚 Facebook 是做什么的。

然而,可能有两种可能性:

  1. 填充的音频可能只是时间戳不连续。即流可能以某个时间戳开始(无声),稍后当真正的音频开始时 - 时间戳不同。因此,转码器可能足够聪明,可以丢弃这部分。

  2. 另一种可能性是,当您添加静音时,直到某个时刻您才添加时间戳。转码器可能会丢弃音频帧,直到它看到第一个有意义的有效时间戳。许多希望从实时流接收数据的实时转码器/解码器也是这种情况。

详细说明这一点以获得更准确的答案。

I cannot definitely answer this - because i don't quite know what Facebook does it.

There can be however, two possibility:

  1. the padded audio might just have discontinuity of time stamp. i.e. The stream may start with some timestamp (in silence) and later when real audio starts- the time stamp differs. Hence the transcoder could be smart enough to throw this portion away.

  2. the other possibility is that when you are adding silence you are not adding time stamp at all till some point. The transcoder could just be dropping the audio frames till it sees the first valid timestamp that makes sense. This is also the case with many real time transcoders/decoders who expect to receive data from a live stream.

Elaborate more on this line to get more accurate answer.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文