The "codecs" parameter in common media types - Web media technologies 编辑
At a fundamental level, you can specify the type of a media file using a simple MIME type, such as video/mp4
or audio/mpeg
. However, many media types—especially those that support video tracks—can benefit from the ability to more precisely describe the format of the data within them. For instance, just describing a video in an MPEG-4 file with the MIME type video/mp4
doesn't say anything about what format the actual media within takes.
For that reason, the codecs
parameter can be added to the MIME type describing media content. With it, container-specific information can be provided. This information may include things like the profile of the video codec, the type used for the audio tracks, and so forth.
This guide briefly examines the syntax of the media type codecs
parameter and how it's used with the MIME type string to provide details about the contents of audio or video media beyond indicating the container type.
General syntax
A basic MIME media type is expressed by stating the type of media (audio
, video
, etc), then a slash character (/
), then the container format used to contain the media:
audio/mpeg
- An audio file using the MPEG file type, such as an MP3.
video/ogg
- A video file using the Ogg file type.
video/mp4
- A video file using the MPEG-4 file type.
video/quicktime
- A video file in Apple's QuickTime format. As noted elsewhere, this format was once commonly used on the web but no longer is, since it required a plugin to use.
However, each of these MIME types is vague. All of these file types support a variety of codecs, and those codecs may have any number of profiles, levels, and other configuration factors. For this reason, you can add the codecs
parameter to the media type.
To do so, append a semicolon (;
) followed by codecs=
and then the string describing the format of the contents of the file. Some media types only let you specify the names of the codecs to use, while others allow you to specify various constraints on those codecs as well. You can specify multiple codecs by separating them with commas.
audio/ogg; codecs=vorbis
- An Ogg file containing a Vorbis audio track.
video/webm; codecs="vp8, vorbis"
- A WebM file containing VP8 video and/or Vorbis audio.
video/mp4; codecs="avc1.4d002a"
- An MPEG-4 file containing AVC (H.264) video, Main Profile, Level 4.2.
As is the case with any MIME type parameter, codecs
must be changed to codecs*
(note the asterisk character, *
) if any of the properties of the codec use special characters which must be percent-encoded per RFC 2231, section 4: MIME Parameter Value and Encoded Word Extensions. You can use the JavaScript encodeURI()
function to encode the parameter list; similarly, you can use decodeURI()
to decode a previously encoded parameter list.
When the codecs
parameter is used, the specified list of codecs must include every codec used for the contents of the file. The list may also contain codecs not present in the file.
Codec options by container
The containers below support extended codec options in their codecs
parameters:
Several of the links above go to the same section; that's because those media types are all based on ISO Base Media File Format (ISO BMFF), so they share the same syntax.
AV1
The syntax of the codecs
parameter for AV1 is defined the AV1 Codec ISO Media File Format Binding specification, section 5: Codecs Parameter String.
av01.P.LLT.DD[.M[.CCC[.cp[.tc[.mc[.F]]]]]]
This codec parameter string's components are described in more detail in the table below. Each component is a fixed number of characters long; if the value is less than that length, it must be padded with leading zeros.
Component | Details | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | The one-digit profile number:
| ||||||||||||||||||||
LL | The two-digit level number, which is converted to the X.Y format level format, where X = 2 + (LL >> 2) and Y = LL & 3 . See Appendix A, section 3 in the AV1 Specification for details. | ||||||||||||||||||||
T | The one-character tier indicator. For the Main tier (seq_tier equals 0), this character is the letter M . For the High tier (seq_tier is 1), this character is the letter H . The High tier is only available for level 4.0 and up. | ||||||||||||||||||||
DD | The two-digit component bit depth. This value must be one of 8, 10, or 12; which values are valid varies depending on the profile and other properties. | ||||||||||||||||||||
M | The one-digit monochrome flag; if this is 0, the video includes the U and V planes in addition to the Y plane. Otherwise, the video data is entirely in the Y plane and is therefore monochromatic. See YUV in Digital video concepts for details on how the YUV color system works. The default value is 0 (not monochrome). | ||||||||||||||||||||
CCC |
The third digit in The default value is | ||||||||||||||||||||
cp | The two-digit color_primaries value indicates the color system used by the media. For example, BT.2020/BT.2100 color, as used for HDR video, is 09 . The information for this—and for each of the remaining components—is found in the Color config semantics section of the AV1 specification. The default value is 01 (ITU-R BT.709). | ||||||||||||||||||||
tc | The two-digit transfer_characteristics value. This value defines the function used to map the gamma (delightfully called the "opto-electrical transfer function" in technical parlance) from the source to the display. For example, 10-bit BT.2020 is 14 . The default value is 01 (ITU-R BT.709). | ||||||||||||||||||||
mc | The two-digit matrix_coefficients constant selects the matrix coefficients used to convert the red, blue, and green channels into luma and chroma signals. For example, the standard coefficients used for BT.709 are indicated using the value 01 . The default value is 01 (ITU-R BT.709). | ||||||||||||||||||||
F | A one-digit flag indicating whether the color should be allowed to use the full range of possible values (1 ), or should be constrained to those values considered legal for the specified color configuration (that is, the studio swing representation). The default is 0 (use the studio swing representation). |
All fields from M
(monochrome flag) onward are optional; you may stop including fields at any point (but can't arbitrarily leave out fields). The default values are included in the table above. Some example AV1 codec strings:
av01.2.15M.10.0.100.09.16.09.0
- AV1 Professional Profile, level 5.3, Main tier, 10 bits per color component, 4:2:2 chroma subsampling using ITU-R BT.2100 color primaries, transfer characteristics, and YCbCr color matrix. The studio swing representation is indicated.
av01.0.15M.10
- AV1 Main Profile, level 5.3, Main tier, 10 bits per color component. The remaining properties are taken from the defaults: 4:2:0 chroma subsampling, BT.709 color primaries, transfer characteristics, and matrix coefficients. Studio swing representation.
ISO Base Media File Format: MP4, QuickTime, and 3GP
All media types based upon the ISO Base Media File Format (ISO BMFF) share the same syntax for the codecs
parameter. These media types include MPEG-4 (and, in fact, the QuickTime file format upon which MPEG-4 is based) as well as 3GP. Both video and audio tracks can be described using the codecs
parameter with the following MIME types:
MIME type | Description |
---|---|
audio/3gpp | 3GP audio (RFC 3839: MIME Type Registrations for 3rd generation Partnership Project (3GP) Multimedia files) |
video/3gpp | 3GP video (RFC 3839: MIME Type Registrations for 3rd generation Partnership Project (3GP) Multimedia files) |
audio/3gp2 | 3GP2 audio (RFC 4393: MIME Type Registrations for 3GPP2 Multimedia files) |
video/3gp2 | 3GP2 video (RFC 4393: MIME Type Registrations for 3GPP2 Multimedia files) |
audio/mp4 | MP4 audio (RFC 4337: MIME Type Registration for MPEG-4) |
video/mp4 | MP4 audio (RFC 4337: MIME Type Registration for MPEG-4) |
application/mp4 | Non-audiovisual media encapsulated in MPEG-4 |
Each codec described by the codecs
parameter can be specified either as the container's name (3gp
, mp4
, quicktime
, etc.) or as the container name plus additional parameters to specify the codec and its configuration. Each entry in the codec list may contain some number of components, separated by periods (.
).
The syntax for the value of codecs
varies by codec; however, it always starts with the codec's four-character identifier, a period separator (.
), followed by the Object Type Indication (OTI) value for the specific data format. For most codecs, the OTI is a two-digit hexadecimal number; however, it's six hexadecimal digits for AVC (H.264).
Thus, the syntaxes for each of the supported codecs look like this:
cccc[.pp]*
(Generic ISO BMFF)- Where
cccc
is the four-character ID for the codec andpp
is where zero or more two-character encoded property values go. mp4a.oo[.A]
(MPEG-4 audio)- Where
oo
is the Object Type Indication value describing the contents of the media more precisely andA
is the one-digit audio OTI. The possible values for the OTI can be found on the MP4 Registration Authority web site's Object Types page. For example, Opus audio in an MP4 file ismp4a.ad
. For further details, see MPEG-4 audio. mp4v.oo[.V]
(MPEG-4 video)- Here,
oo
is again the OTI describing the contents more precisely, whileV
is the one-digit video OTI. avc1.oo[.PPCCLL]
(AVC video)oo
is the OTI describing the contents, whilePPCCLL
is six hexadecimal digits specifying the profile number (PP
), constraint set flags (CC
), and level (LL
). See AVC profiles for the possible values ofPP
.The constraint set flags byte is comprised of one-bit Boolean flags, with the most significant bit being referred to as flag 0 (or
constraint_set0_flag
, in some resources), and each successive bit being numbered one higher. Currently, only flags 0 through 2 are used; the other five bits must be zero. The meanings of the flags vary depending on the profile being used.The level is a fixed-point number, so a value of
14
(decimal 20) means level 2.0 while a value of3D
(decimal 61) means level 6.1. Generally speaking, the higher the level number, the more bandwidth the stream will use and the higher the maximum video dimensions are supported.
AVC profiles
The following are the AVC profiles and their profile numbers for use in the codecs
parameter, as well as the value to specify for the constraints component, CC
.
Profile | Number (Hex) | Constraints byte |
---|---|---|
Constrained Baseline Profile (CBP) CBP is primarily a solution for scenarios in which resources are constrained, or resource use needs to be controlled to minimize the odds of the media performing poorly. | 42 | 40 |
Baseline Profile (BP) Similar to CBP but with more data loss protections and recovery capabilities. This is not as widely used as it was before CBP was introduced. All CBP streams are considered to also be BP streams. | 42 | 00 |
Extended Profile (XP) Designed for streaming video over the network, with high compression capability and further improvements to data robustness and stream switching. | 58 | 00 |
Main Profile (MP) The profile used for standard-definition digital television being broadcast in MPEG-4 format. Not used for high-definition television broadcasts. This profile's importance has faded since the introduction of the High Profile—which was added for HDTV use—in 2004. | 4D | 00 |
High Profile (HiP) Currently, HiP is the primary profile used for broadcast and disc-based HD video; it's used both for HD TV broadcasts and for Blu-Ray video. | 64 | 00 |
Progressive High Profile (PHiP) Essentially High Profile without support for field coding. | 64 | 08 |
Constrained High Profile PHiP, but without support for bi-predictive slices ("B-slices"). | 64 | 0C |
High 10 Profile (Hi10P) High Profile, but with support for up to 10 bits per color component. | 6E | 00 |
High 4:2:2 Profile (Hi422P) Expands upon Hi10P by adding support for 4:2:2 chroma subsampling along with up to10 bits per color component. | 7A | 00 |
High 4:4:4 Predictive Profile (Hi444PP) In addition to the capabilities included in Hi422P, Hi444PP adds support for 4:4:4 chroma subsampling (in which no color information is discarded). Also includes support for up to 14 bits per color sample and efficient lossless region coding. The option to encode each frame as three separate color planes (that is, each color's data is stored as if it were a single monochrome frame). | F4 | 00 |
High 10 Intra Profile High 10 constrained to all-intra-frame use. Primarily used for professional apps. | 6E | 10 |
High 4:2:2 Intra Profile The Hi422 Profile with all-intra-frame use. | 7A | 10 |
High 4:4:4 Intra Profile The High 4:4:4 Profile constrained to use only intra frames. | F4 | 10 |
CAVLC 4:4:4 Intra Profile The High 4:4:4 Profile constrained to all-intra use, and to using only CAVLC entropy coding. | 44 | 00 |
Scalable Baseline Profile Intended for use with video conferencing as well as surveillance and mobile uses, the SVC Baseline Profile is based on AVC's Constrained Baseline profile. The base layer within the stream is provided at a high quality level, with some number of secondary substreams that offer alternative forms of the same video for use in various constrained environments. These may include any combination of reduced resolution, reduced frame rate, or increased compression levels. | 53 | 00 |
Scalable Constrained Baseline Profile Primarily used for real-time communication applications. Not yet supported by WebRTC, but an extension to the WebRTC API to allow SVC is in development. | 53 | 04 |
Scalable High Profile Meant mostly for use in broadcast and streaming applications. The base (or highest quality) layer must conform to the AVC High Profile. | 56 | 00 |
Scalable Constrained High Profile A subset of the Scalable High Profile designed mainly for real-time communticions. | 56 | 04 |
Scalable High Intra Profile Primarily useful only for production applications, this profile supports only all-intra usage. | 56 | 20 |
Stereo High Profile The Stereo High Profile provides stereoscopic video using two renderings of the scene (left eye and right eye). Otherwise, provides the same features as the High profile. | 80 | 00 |
Multiview High Profile Supports two or more views using both temporal and MVC inter-view prediction. Does not support field pictures or macroblock-adaptive frame-field coding. | 76 | 00 |
Multiview Depth High Profile Based on the High Profile, to which the main substream must adhere. The remaining substreams must match the Stereo High Profile. | 8A | 00 |
MPEG-4 audio
When the value of an entry in the codecs
list begins with mp4a
, the syntax of the value should be:
mp4a.oo[.A]
Here, oo
is the two-digit hexadecimal Object Type Indication which specifies the codec class being used for the media. The OTIs are assigned by the MP4 Registration Authority, which maintains a list of the possible OTI values. A special value is 40
; this indicates that the media is MPEG-4 audio (ISO/IEC 14496 Part 3). In order to be more specific still, a third component—the Audio Object Type—is added for OTI 40
to narrow the type down to a specific subtype of MPEG-4.
The Audio Object Type is specified as a one or two digit decimal value (unlike most other values in the codecs
parameter, which use hexadecimal). For example, MPEG-4's AAC-LC has an audio object type number of 2
, so the full codecs
value representing AAC-LC is mp4a.40.2
.
Thus, ER AAC LC, whose Audio Object Type is 17, can be represented using the full codecs
value mp4a.40.17
. Single digit values can be given either as one digit (which is the best choice, since it will be the most broadly compatible) or with a leading zero padding it to two digits, such as mp4a.40.02
.
Note: The specification originally mandated that the Audio Object Type number in the third component be only one decimal digit. However, amendments to the specification over time extended the range of these values well beyond one decimal digit, so now the third parameter may be either one or two digits. Padding values below 10 with a leading 0
is optional. Older implementations of MPEG-4 codecs may not support two-digit values, however, so using a single digit when possible will maximize compatibility.
The Audio Object Types are defined in ISO/IEC 14496-3 subpart 1, section 1.5.1. The table below provides a basic list of the Audio Object Types and in the case of the more common object ypes provides a list of the profiles supporting it, but you should refer to the specification for details if you need to know more about the inner workings of any given MPEG-4 audio type.
ID | Audio Object Type | Profile support |
---|---|---|
0 | NULL | |
1 | AAC Main | Main |
2 | AAC LC (Low Complexity) | Main, Scalable, HQ, LD v2, AAC, HE-AAC, HE-AAC v2 |
3 | AAC SSR (Scalable Sampling Rate) | Main |
4 | AAC LTP (Long Term Prediction) | Main, Scalable, HQ |
5 | SBR (Spectral Band Replication) | HE-AAC, HE-AAC v2 |
6 | AAC Scalable | Main, Scalable, HQ |
7 | TwinVQ (Coding for ultra-low bit rates) | Main, Scalable |
8 | CELP (Code-Excited Linear Prediction) | Main, Scalable, Speech, HQ, LD |
9 | HVXC (Harmonic Vector Excitation Coding) | Main, Scalable, Speech, LD |
10 – 11 | Reserved | |
12 | TTSI (Text to Speech Interface) | Main, Scalable, Speech, Synthetic, LD |
13 | Main Synthetic | Main, Synthetic |
14 | Wavetable Synthesis | |
15 | General MIDI | |
16 | Algorithmic Synthesis and Audio Effects | |
17 | ER AAC LC (Error Resilient AAC Low-Complexity) | HQ, Mobile Internetworking |
18 | Reserved | |
19 | ER AAC LTP (Error Resilient AAC Long Term Prediction) | HQ |
20 | ER AAC Scalable (Error Resilient AAC Scalable) | Mobile Internetworking |
21 | ER TwinVQ (Error Resilient TwinVQ) | Mobile Internetworking |
22 | ER BSAC (Error Reslient Bit-Sliced Arithmetic Coding) | Mobile Internetworking |
23 | ER AAC LD (Error Resilient AAC Low-Delay; used for two-way communication) | LD, Mobile Internetworking |
24 | ER CELP (Error Resilient Code-Excited Linear Prediction) | HQ, LD |
25 | ER HVXC (Error Resilient Harmonic Vector Excitation Coding) | LD |
26 | ER HILN (Error Resilient Harmonic and Individual Line plus Noise) | |
27 | ER Parametric (Error Resilient Parametric) | |
28 | SSC (Sinusoidal Coding) | |
29 | PS (Parametric Stereo) | HE-AAC v2 |
30 | MPEG Surround | |
31 | Escape | |
32 | MPEG-1 Layer-1 | |
33 | MPEG-1 Layer-2 (MP2) | |
34 | MPEG-1 Layer-3 (MP3) | |
35 | DST (Direct Stream Transfer) | |
36 | ALS (Audio Lossless) | |
37 | SLS (Scalable Lossless) | |
38 | SLS Non-core (Scalable Lossless Non-core) | |
39 | ER AAC ELD (Error Resilient AAC Enhanced Low Delay) | |
40 | SMR Simple (Symbolic Music Representation Simple) | |
41 | SMR Main (Symbolic Music Representation Main) | |
42 | Reserved | |
43 | SAOC (Spatial Audio Object Coding)[1] | |
44 | LD MPEG Surround (Low Delay MPEG Surround)[1] | |
45 and up | Reserved |
[1] SAOC and LD MPEG Surround are defined in ISO/IEC 14496-3:2009/Amd.2:2010(E).
WebM
The basic form for a WebM codecs
parameter is to list one or more of the four WebM codecs by name, separated by commas. The table below shows some examples:
MIME type | Description |
---|---|
video/webm;codecs="vp8" | A WebM video with VP8 video in it; no audio is specified. |
video/webm;codecs="vp9" | A WebM video with VP9 video in it. |
audio/webm;codecs="vorbis" | Vorbis audio in a WebM container. |
audio/webm;codecs="opus" | Opus audio in a WebM container. |
video/webm;codecs="vp8,vorbis" | A WebM container with VP8 video and Vorbis audio. |
video/webm;codecs="vp9,opus" | A WebM container with VP9 video and Opus audio. |
The strings vp8.0
and vp9.0
also work, but are not recommended.
ISO Base Media File Format syntax
As part of a move toward a standardized and powerful format for the codecs
parameter, WebM is moving toward describing video content using a syntax based on that defined by the ISO Base Media File Format. This syntax is defined in VP Codec ISO Media File Format Binding, in the section Codecs Parameter String. The audio codec continues to be indicated as either vorbis
or opus
.
In this format, the codecs
parameter's value begins with a four-character code identifying the codec being used in the container, which is then followed by a series of period (.
) separated two-digit values.
cccc.PP.LL.DD.CC[.cp[.tc[.mc[.FF]]]]
The first five components are required; everything from cp
(color primaries) onward is optional; you can stop including components at any point from then onward. Each of these components is described in the following table. Following the table are some examples.
Component | Details | ||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
cccc | A four-character code indicating which indicates which of the possible codecs is being described. Potential values are:
| ||||||||||||||||||||||||||||||||||
PP | The two-digit profile number, padded with leading zeroes if necessary to be exactly two digits.
| ||||||||||||||||||||||||||||||||||
LL | The two-digit level number. The level number is a fixed-point notation, where the first digit is the ones digit, and the second digit represents tenths. For example, level 3 is 30 and level 6.1 is 61 . | ||||||||||||||||||||||||||||||||||
DD | The bit depth of the luma and color component values; permitted values are 8, 10, and 12. | ||||||||||||||||||||||||||||||||||
CC | A two-digit value indicating which chroma subsampling format to use. The following table lists permitted values; see Chroma subsampling in Digital video concepts for additional information about this topic and others.
| ||||||||||||||||||||||||||||||||||
cp | A two-digit integer specifying which of the color primaries from Section 8.1 of the ISO/IEC 23001-8:2016 standard. This component, and every component after it, is optional. The possible values of the color primaries component are:
| ||||||||||||||||||||||||||||||||||
tc | A two-digit integer indicating the transferCharacteristics for the video. This value is from Section 8.2 of ISO/IEC 23001-8:2016, and indicates the transfer characteristics to be used when adapting the decoded color to the render target. | ||||||||||||||||||||||||||||||||||
mc | The two-digit value for the matrixCoefficients property. This value comes from the table in Section 8.3 of the ISO/IEC 23001-8:2016 specification. This value indicates which set of coefficients to use when mapping from the native red, blue, and green primaries to the luma and chroma signals. These coefficients are in turn used with the equations found in that same section. | ||||||||||||||||||||||||||||||||||
FF | Indicates whether to restrict the black level and color range of each color component to the legal range. For 8 bit color samples, the legal range is 16-235. A value of 00 indicates that these limitations should be enforced, while a value of 01 allows the full range of possible values for each component, even if the resulting color is out of bounds for the color system. |
WebM media type examples
video/webm;codecs="vp08.00.41.08,vorbis"
- VP8 video, profile 0 level 4.1, using 8-bit YUV with 4:2:0 chroma subsampling, using BT.709 color primaries, transfer function, and matrix coefficients, with the luminance and chroma values encoded within the legal ("studio") range. The video is Vorbis.
video/webm;codecs="vp09.02.10.10.01.09.16.09.01,opus"
- VP9 video, profile 2 level 1.0, with 10-bit YUV content using 4:2:0 chroma subsampling, BT.2020 primaries, ST 2084 EOTF (HDR SMPTE), BT.2020 non-constant luminance color matrix, and full-range chroma and luma encoding. The audio is in Opus format.
Using the codecs parameter
You can use the codecs
parameter in a few situations. Firstly, you can use it with the <source>
element when creating an <audio>
or <video>
element, in order to establish a group of options for the browser to choose from when selecting the format of the media to present to the user in the element.
You can also use the codecs parameter when specifying a MIME media type to the MediaSource.isTypeSupported()
method; this method returns a Boolean which indicates whether or not the media is likely to work on the current device.
See also
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论