libavcodec，如何对不同帧率的视频进行转码？

发布于 2024-10-02 12:09:50 字数 1810 浏览 5 评论 0原文

我通过 v4l 从摄像头抓取视频帧，我需要将它们转码为 mpeg4 格式，以便通过 RTP 连续流式传输它们。

一切实际上都“有效”，但在重新编码时有些东西我没有：输入流产生 15fps，而输出为 25fps，并且每个输入帧都转换为一个视频对象序列（我通过简单的检查验证了这一点）在输出比特流上）。我猜想接收器正确地解析了 mpeg4 比特流，但 RTP 打包在某种程度上是错误的。我该如何将编码比特流拆分为一个或多个 AVPacket ？也许我错过了显而易见的事情，我只需要寻找 B/P 帧标记，但我认为我没有正确使用编码 API。

这是我的代码摘录，基于可用的 ffmpeg 示例：

// input frame
AVFrame *picture;
// input frame color-space converted
AVFrame *planar;
// input format context, video4linux2
AVFormatContext *iFmtCtx;
// output codec context, mpeg4
AVCodecContext *oCtx;
// [ init everything ]
// ...
oCtx->time_base.num = 1;
oCtx->time_base.den = 25;
oCtx->gop_size = 10;
oCtx->max_b_frames = 1;
oCtx->bit_rate = 384000;
oCtx->pix_fmt = PIX_FMT_YUV420P;

for(;;)
{
  // read frame
  rdRes = av_read_frame( iFmtCtx, &pkt );
  if ( rdRes >= 0 && pkt.size > 0 )
  {
    // decode it
    iCdcCtx->reordered_opaque = pkt.pts;
    int decodeRes = avcodec_decode_video2( iCdcCtx, picture, &gotPicture, &pkt );
    if ( decodeRes >= 0 && gotPicture )
    {
      // scale / convert color space
      avpicture_fill((AVPicture *)planar, planarBuf.get(), oCtx->pix_fmt, oCtx->width, oCtx->height);
      sws_scale(sws, picture->data, picture->linesize, 0, iCdcCtx->height, planar->data, planar->linesize);
      // encode
      ByteArray encBuf( 65536 );
      int encSize = avcodec_encode_video( oCtx, encBuf.get(), encBuf.size(), planar );
      // this happens every GOP end
      while( encSize == 0 )
        encSize = avcodec_encode_video( oCtx, encBuf.get(), encBuf.size(), 0 );
      // send the transcoded bitstream with the result PTS
      if ( encSize > 0 )
        enqueueFrame( oCtx->coded_frame->pts, encBuf.get(), encSize );
    }
  }
}

原文

I'm grabbing video frames from the camera via v4l, and i need to transcode them in mpeg4 format to successively stream them via RTP.

Everything actually "works" but there's something I don't while re-encoding: the input stream produces 15fps, while the output is at 25fps, and every input frame is converted in one single video object sequence (i verified this with a simple check on the output bitstream). I guess that the receiver is correctly parsing the mpeg4 bitstream but the RTP packetization is somehow wrong. How am I supposed to split the encoded bitstream in one or more AVPacket ? Maybe I'm missing the obvious and I just need to look for B/P frame markers, but I think I'm not using the encode API correctly.

Here is an excerpt of my code, that is based on the available ffmpeg samples:

// input frame
AVFrame *picture;
// input frame color-space converted
AVFrame *planar;
// input format context, video4linux2
AVFormatContext *iFmtCtx;
// output codec context, mpeg4
AVCodecContext *oCtx;
// [ init everything ]
// ...
oCtx->time_base.num = 1;
oCtx->time_base.den = 25;
oCtx->gop_size = 10;
oCtx->max_b_frames = 1;
oCtx->bit_rate = 384000;
oCtx->pix_fmt = PIX_FMT_YUV420P;

for(;;)
{
  // read frame
  rdRes = av_read_frame( iFmtCtx, &pkt );
  if ( rdRes >= 0 && pkt.size > 0 )
  {
    // decode it
    iCdcCtx->reordered_opaque = pkt.pts;
    int decodeRes = avcodec_decode_video2( iCdcCtx, picture, &gotPicture, &pkt );
    if ( decodeRes >= 0 && gotPicture )
    {
      // scale / convert color space
      avpicture_fill((AVPicture *)planar, planarBuf.get(), oCtx->pix_fmt, oCtx->width, oCtx->height);
      sws_scale(sws, picture->data, picture->linesize, 0, iCdcCtx->height, planar->data, planar->linesize);
      // encode
      ByteArray encBuf( 65536 );
      int encSize = avcodec_encode_video( oCtx, encBuf.get(), encBuf.size(), planar );
      // this happens every GOP end
      while( encSize == 0 )
        encSize = avcodec_encode_video( oCtx, encBuf.get(), encBuf.size(), 0 );
      // send the transcoded bitstream with the result PTS
      if ( encSize > 0 )
        enqueueFrame( oCtx->coded_frame->pts, encBuf.get(), encSize );
    }
  }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

知你几分 2024-10-09 12:09:50

最简单的解决方案是使用两个线程。第一个线程将执行您问题中概述的所有操作（解码、缩放/颜色空间转换、编码）。部分转码的帧将被写入与第二个线程共享的中间队列。在这种特殊情况下（从较低比特率转换为较高比特率），该队列的最大长度为 1 帧。第二个线程将从输入队列中读取循环帧，如下所示：

void FpsConverter::ThreadProc()
{

timeBeginPeriod(1);
DWORD start_time = timeGetTime();
int frame_counter = 0;
while(!shouldFinish()) {
    Frame *frame = NULL;
    DWORD time_begin = timeGetTime();
    ReadInputFrame(frame);
    WriteToOutputQueue(frame);
    DWORD time_end = timeGetTime();
    DWORD next_frame_time = start_time + ++frame_counter * frame_time;
    DWORD time_to_sleep = next_frame_time - time_end;
    if (time_to_sleep > 0) {
        Sleep(time_to_sleep);
    }
}
timeEndPeriod(1);
}

当 CPU 功率足够并且需要更高的保真度和平滑度时，您不仅可以从一帧计算输出帧，还可以通过某种插值计算更多帧（类似于在mpeg 编解码器）。输出帧时间戳与输入帧时间戳越接近，分配给该特定输入帧的权重就越大。

Most simple solution would be to use two threads. First thread would do all the things outlined in your question (decoding, scaling / color-space conversion, coding). Partially transcoded frames would be written to intermediate queue shared with second thread. Maximum length of this queue would be in this particular case (converting from lower to higher bitrate) 1 frame. Second thread would be reading in loop frames from input queue like this:

void FpsConverter::ThreadProc()
{

timeBeginPeriod(1);
DWORD start_time = timeGetTime();
int frame_counter = 0;
while(!shouldFinish()) {
    Frame *frame = NULL;
    DWORD time_begin = timeGetTime();
    ReadInputFrame(frame);
    WriteToOutputQueue(frame);
    DWORD time_end = timeGetTime();
    DWORD next_frame_time = start_time + ++frame_counter * frame_time;
    DWORD time_to_sleep = next_frame_time - time_end;
    if (time_to_sleep > 0) {
        Sleep(time_to_sleep);
    }
}
timeEndPeriod(1);
}

When CPU power is sufficient and higher fidelity and smoothness is required you could compute output frame not just from one frame but more frames by some sort of interpolation (similar to techniques used in mpeg codecs). The closer output frame time stamp to input frame time stamp, the more weight you should assign to this particular input frame.

回复收藏 0 原文

~没有更多了~