使用 ffmpeg (libavcodec) 通过 RTP 解码 H264 视频的问题
我使用 SDP 的 profile-level-id 和 sprop-parameter-set 设置 AvCodecContext 的 profile_idc、level_idc、extradata 和 extradata_size。
我将 Coded Slice、SPS、PPS 和 NAL_IDR_SLICE 数据包的解码分开:
Init:
uint8_t start_sequence[]= {0, 0, 1}; int 大小= recv(id_de_la_socket,(char*) rtpReceive,65535,0);
编码切片:
char *z = new char[size-16+sizeof(start_sequence)];
memcpy(z,&start_sequence,sizeof(start_sequence));
memcpy(z+sizeof(start_sequence),rtpReceive+16,size-16);
ConsumedBytes = avcodec_decode_video(codecContext,pFrame,&GotPicture,(uint8_t*)z,size-16+sizeof(start_sequence));
delete z;
结果:ConsumedBytes > 0 且 GotPicture > 0(通常)
SPS 和 PPS:
相同的代码。 结果:ConsumedBytes > 0 且 GotPicture =0
这是正常的,我认为
当我找到一对新的 SPS/PPS 时,我用该数据包的有效负载及其大小更新 extradata 和 extrada_size 。
NAL_IDR_SLICE :
Nal 单元类型为 28 =>idr 帧已分段,因此我尝试了两种解码方法
1)我在第一个片段(没有 RTP 标头)前加上序列 0x000001 并将其发送到 avcodec_decode_video。然后我将其余的片段发送到该函数。
2) 我在第一个片段(没有 RTP 标头)前添加序列 0x000001 并将其余片段连接到它。我将此缓冲区发送到解码器。
在这两种情况下,我都没有错误(ConsumedBytes > 0),但我没有检测到帧(GotPicture = 0)...
问题是什么?
I set profile_idc, level_idc, extradata et extradata_size of AvCodecContext with the profile-level-id et sprop-parameter-set of the SDP.
I separate the decoding of Coded Slice, SPS, PPS and NAL_IDR_SLICE packet :
Init:
uint8_t start_sequence[]= {0, 0, 1};
int size= recv(id_de_la_socket,(char*) rtpReceive,65535,0);
Coded Slice :
char *z = new char[size-16+sizeof(start_sequence)];
memcpy(z,&start_sequence,sizeof(start_sequence));
memcpy(z+sizeof(start_sequence),rtpReceive+16,size-16);
ConsumedBytes = avcodec_decode_video(codecContext,pFrame,&GotPicture,(uint8_t*)z,size-16+sizeof(start_sequence));
delete z;
Result: ConsumedBytes >0 and GotPicture >0 (often)
SPS and PPS :
identical code.
Result: ConsumedBytes >0 and GotPicture =0
It's normal I think
When I find a new couple SPS/PPS, I update extradata and extrada_size with the payloads of this packet and their size.
NAL_IDR_SLICE :
The Nal unit type is 28 =>idr Frame are fragmented therefor I tryed two method to decode
1) I prefix the first fragment (without RTP header) with the sequence 0x000001 and send it to avcodec_decode_video. Then I send the rest of fragments to this function.
2) I prefix the first fragment (without RTP header) with the sequence 0x000001 and concatenate the rest of fragments to it. I send this buffer to decoder.
In both cases, I have no error (ConsumedBytes >0) but I detect no frame (GotPicture = 0) ...
What is the problem ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在 RTP 中,所有 H264 I 帧 (IDR) 通常都是分段的。当您接收 RTP 时,您首先必须跳过标头(通常是前 12 个字节),然后到达 NAL 单元(第一个有效负载字节)。如果 NAL 为 28 (1C),则意味着后续有效负载代表一个 H264 IDR(I 帧)片段,您需要收集所有这些片段来重建 H264 IDR(I 帧)。
由于 MTU 有限且 IDR 较大,因此会出现碎片。一个片段可能如下所示:
具有 START BIT = 1 的片段:
其他片段:
要重建 IDR,您必须收集以下信息:
如果
fragment_type == 28< /code> 那么其后的有效负载是 IDR 的一个片段。接下来检查
start_bit
设置,如果是,则该片段是序列中的第一个片段。您可以使用它来重建 IDR 的 NAL 字节,方法是从第一个有效负载字节(3 NAL UNIT BITS)
中取出前 3 位,并将它们与第二个有效负载字节(5 NAL UNIT BITS) 中的最后 5 位组合起来)
所以你会得到一个像这样的字节[3 NAL UNIT BITS | 5 NAL 单位位]
。然后首先将该 NAL 字节与该片段中的所有其他后续字节一起写入到清除缓冲区中。请记住跳过序列中的第一个字节,因为它不是 IDR 的一部分,而仅标识片段。如果
start_bit
和end_bit
为 0,则只需将有效负载(跳过标识片段的第一个有效负载字节)写入缓冲区。如果 start_bit 为 0 且 end_bit 为 1,则意味着它是最后一个片段,您只需将其有效负载(跳过标识该片段的第一个字节)写入缓冲区,现在您就可以重建 IDR。
如果您需要一些代码,只需在评论中询问,我会发布它,但我认为这很清楚如何做...=)
关于解码
今天我想到了为什么你会得到解码 IDR 时出错(我认为您已经很好地重建了它)。您如何构建 AVC 解码器配置记录?您使用的库是否具有自动化功能?如果没有,并且您还没有听说过这一点,请继续阅读...
指定 AVCDCR 是为了允许解码器快速解析解码 H264 (AVC) 视频流所需的所有数据。数据如下:
所有这些数据都在 SDP 中的 RTSP 会话中在以下字段下发送:
profile-level-id
和sprop-参数集
。DECODING PROFILE-LEVEL-ID
权限级别 ID 字符串分为 3 个子字符串,每个子字符串长度为 2 个字符:
[PROFILE IDC][PROFILE IOP][LEVEL IDC]
每个子字符串代表一个字节 base16!因此,如果 Profile IDC 为 28,则意味着它实际上是 40(base10)。稍后您将使用 base10 值构建 AVC 解码器配置记录。
解码SPROP-PARAMETER-SETS
Spprops通常是2个字符串(可能更多),以逗号分隔,并且base64编码!您可以对它们进行解码,但没有必要。您在这里的工作只是将它们从 Base64 字符串转换为字节数组以供以后使用。现在你有 2 个字节数组,第一个数组是 SPS,第二个数组是 PPS。
构建 AVCDCR
现在,您已拥有构建 AVCDCR 所需的一切,首先创建新的干净缓冲区,现在按照此处解释的顺序将这些内容写入其中:
1 - 具有值的字节 1 表示版本
2 - 配置文件 IDC 字节
3 - Prifile IOP 字节
4 - 级别 IDC 字节
5 - 值为 0xFF 的字节(谷歌 AVC 解码器配置记录以查看这是什么)
6 - 值为 0xE1 的字节
7 - Short 包含 SPS 数组长度的值
8 - SPS 字节数组
9 - 包含 PPS 数组数量的字节(您可以在 sprop-parameter-set 中包含更多)
10 - Short 包含后续 PPS 数组的长度
11 - PPS 数组
解码视频流
现在您有了告诉解码器如何解码 H264 视频流的字节数组。我相信如果您的库不是从 SDP 自行构建的,那么您需要这个...
In RTP all H264 I-Frames (IDRs) are usualy fragmented. When you receive RTP you first must skip the header (usualy first 12 bytes) and then get to the NAL unit (first payload byte). If the NAL is 28 (1C) then it means that following payload represents one H264 IDR (I-Frame) fragment and that you need to collect all of them to reconstruct H264 IDR (I-Frame).
Fragmentation occurs because of the limited MTU, and much larger IDR. One fragment can look like this:
Fragment that has START BIT = 1:
Other fragments:
To reconstruct IDR you must collect this info:
If
fragment_type == 28
then payload following it is one fragment of IDR. Next check isstart_bit
set, if it is, then that fragment is the first one in a sequence. You use it to reconstruct IDR's NAL byte by taking the first 3 bits from first payload byte(3 NAL UNIT BITS)
and combine them with last 5 bits from second payload byte(5 NAL UNIT BITS)
so you would get a byte like this[3 NAL UNIT BITS | 5 NAL UNIT BITS]
. Then write that NAL byte first into a clear buffer with all other following bytes from that fragment. Remember to skip first byte in a sequence since it is not a part of IDR, but only identifies the fragment.If
start_bit
andend_bit
are 0 then just write the payload (skipping first payload byte that identifies the fragment) to the buffer.If start_bit is 0 and end_bit is 1, that means that it is the last fragment, and you just write its payload (skipping the first byte that identifies the fragment) to the buffer, and now you have your IDR reconstructed.
If you need some code, just ask in comment, I'll post it, but I think this is pretty clear how to do... =)
CONCERNING THE DECODING
It crossed my mind today why you get error on decoding the IDR (I presumed that you have reconstructed it good). How are you building your AVC Decoder Configuration Record? Does the lib that you use have that automated? If not, and you havent heard of this, continue reading...
AVCDCR is specified to allow decoders to quickly parse all the data they need to decode H264 (AVC) video stream. And the data is following:
All this data is sent in RTSP session in SDP under the fields:
profile-level-id
andsprop-parameter-sets
.DECODING PROFILE-LEVEL-ID
Prifile level ID string is divided into 3 substrings, each 2 characters long:
[PROFILE IDC][PROFILE IOP][LEVEL IDC]
Each substring represents one byte in base16! So, if Profile IDC is 28, that means it is actualy 40 in base10. Later you will use base10 values to construct AVC Decoder Configuration Record.
DECODING SPROP-PARAMETER-SETS
Sprops are usualy 2 strings (could be more) that are comma separated, and base64 encoded! You can decode both of them but there is no need to. Your job here is just to convert them from base64 string into byte array for later use. Now you have 2 byte arrays, first array us SPS, second one is PPS.
BUILDING THE AVCDCR
Now, you have all you need to build AVCDCR, you start by making new clean buffer, now write these things in it in the order explained here:
1 - Byte that has value 1 and represents version
2 - Profile IDC byte
3 - Prifile IOP byte
4 - Level IDC byte
5 - Byte with value 0xFF (google the AVC Decoder Configuration Record to see what this is)
6 - Byte with value 0xE1
7 - Short with value of the SPS array length
8 - SPS byte array
9 - Byte with the number of PPS arrays (you could have more of them in sprop-parameter-set)
10 - Short with the length of following PPS array
11 - PPS array
DECODING VIDEO STREAM
Now you have byte array that tells the decoder how to decode H264 video stream. I believe that you need this if your lib doesn't build it itself from SDP...
我不知道你的其余实现,但你收到的“片段”似乎是 NAL 单元。因此,当您在将比特流发送到 ffmpeg 之前重建比特流时,每个可能都需要附加 NALU 起始代码(
00 00 01
或00 00 00 01
)。无论如何,您可能会发现 H264 RTP 打包的 RFC 很有用:
http://www .rfc-editor.org/rfc/rfc3984.txt
希望这有帮助!
I don't know about the rest of your implementation, but it seems likely the 'fragments' you are receiving are NAL units. Therefore each, each may need the the NALU start-code (
00 00 01
or00 00 00 01
) appended when you reconstruct the bitstream before sending it to ffmpeg.At any rate, you might find the RFC for H264 RTP packetization useful:
http://www.rfc-editor.org/rfc/rfc3984.txt
Hope this helps!
我有一个用于 C# 的 @ https://net7mma.codeplex.com/ 的实现,但过程是到处都一样。
这里是相关代码
还有各种其他 RFC 的实现,有助于让媒体在 MediaElement 或其他软件中播放,或者只是将其保存到磁盘。
正在写入容器格式。
I have an implementation of this @ https://net7mma.codeplex.com/ for c# but the process is the same everywhere.
Here is the relevant code
There are also implementations for various other RFC's which help getting the media to play in a MediaElement or in other software or just saving it to disk.
Writing to a container format is underway.