当前位置：文江博客话题详情

从哪里开始学习音频或视频编解码器？

发布于 2024-08-26 06:59:52 字数 1539 浏览 6 评论 0原文

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

信仰 2024-09-02 06:59:52

您的标题询问了 A/V 压缩，但您的其余评论讨论了解析媒体文件和媒体文件。识别其编解码器。这些是非常不同的任务：spec'd &由不同的组织实施，由大多数多媒体库中的不同 API 执行，最重要的是需要非常不同的技能集。

A/V 文件格式与任何其他文件格式没有太大区别，而其他文件格式只是正式语法。解析、验证和生成的对象图在概念上与任何其他语法没有什么不同——并且在实践中，它们往往比您在标准 CS 课程中遇到的语法（编译器、有限自动机）简单得多。 AVI 文件格式在此有点过时了点，但我仍然建议从这里开始，因为：

当今许多更复杂的格式全部或部分类似于 AVI，或者至少假设您熟悉其基本结构
AVI 是已知多媒体格式更大家族的成员例如 RIFF，您会发现它在许多其他地方都有使用例如 WAV

与此同时，编解码器是您可能在“消费者”软件中找到的一些最复杂的算法。它们在很大程度上借鉴了学术界和大公司研发部门（包括其庞大的专利库）的进步。要精通编解码器，您至少需要了解以下基础知识：

信息论
常见熵编码算法
傅立叶分析（以及尽可能多的其他 DSP）
心理声学 /心理视觉建模
由制作/广播生命周期、传统视频设备和媒体施加的实际限制。标准和令人讨厌的旧物理学，包括：
- 隔行扫描
- 修复了颜色空间
- 镜头光学
施加的实际限制当今的 CPU 架构，特别是：
- SIMD优化
- 缓存行别名、预取等

如果您已经拥有不错的背景（例如，您已经学过一两个本科水平的“工程师数学”类型）许多最好的 A/V 编解码器都是开源的：

x264 （MPEG-4 第 10 部分，又名 AVC）
LAME（MPEG-1 第 3 层），又名 mp3)
Xvid （MPEG-4 第 2 部分，与 Divx 和许多其他文件相同）
Vorbis（替代、无专利的音频编解码器）
Dirac（基于小波变换的替代、无专利视频编解码器）

Your title asks about A/V compression, but the rest of your comments talks about parsing the media file & identifying its codec. Those are very different tasks: spec'd & implemented by different organizations, performed by different APIs in most multimedia libraries, and above all requiring very different skill sets.

A/V file formats aren't too different from any other file format, which in turn are just formal grammars. Parsing, validation, and the resulting object graphs are conceptually no different from any other grammar -- and in practice, they tend to be far simpler than the grammars you encounter in a standard CS curriculum (compilers, finite automata). The AVI file format is kind of antiquated at this point, but I'd still recommend starting there because:

many of today's more complex formats resemble AVI in whole or in part, or at minimum assume you're familiar with its basic structures
AVI is a member of a larger family of multimedia formats known as RIFF, which you'll find used in many other places such as WAVs

Codecs, meanwhile, are some of the most complex algorithms you're likely to find among "consumer" software. They draw heavily on advancements in both the academic community and the R&D arms of large corporations (including their vast patent libraries). To be proficient in codecs you need to know the at least the basics of:

information theory
common entropy coding algorithms
Fourier analysis (and as much other DSP as possible)
psychoacoustic/psychovisual modeling
practical limitations imposed by the production/broadcast lifecycle, legacy video equipment & standards, and pesky old physics, including:
- interlacing
- fixed colorspaces
- lens optics
practical limitations imposed by today's CPU architectures, especially:
- SIMD optimization
- cache line aliasing, prefetching, etc

If you have already have a decent background (eg, you've taken one or two undergraduate level "math for engineers"-type of classes) then I say dive right in. Many of the best A/V codecs are open source:

x264 (MPEG-4 part 10, aka AVC)
LAME (MPEG-1 layer 3, aka mp3)
Xvid (MPEG-4 part 2, same as Divx and many others)
Vorbis (alternative, patent-free audio codec)
Dirac (alternative, patent-free video codec based on a wavelet transform)

回复收藏 0 原文

小草泠泠 2024-09-02 06:59:52

一般来说，视频压缩涉及丢弃尽可能多的信息，同时对最终用户的观看体验影响最小。例如，使用子采样 YUV 而不是 RGB 可以立即将视频大小减半。这是可能的，因为人眼对颜色的敏感度低于对亮度的敏感度。在YUV中，Y值代表亮度，U、V值代表颜色。因此，您可以丢弃一些颜色信息，从而减小文件大小，而观看者不会注意到任何差异。

此后，大多数压缩技术特别利用了 2 个冗余。第一个是时间冗余，第二个是空间冗余。

时间冗余指出视频序列中的连续帧非常相似。通常，视频的帧率为每秒 20-30 帧，1/30 秒内没有太大变化。拍摄任何 DVD 并将其暂停，然后将其移动到一帧上，并注意这 2 个图像的相似程度。因此，MPEG-4（和其他压缩标准）不是独立编码每个帧，而是仅对连续帧之间的差异进行编码（使用运动估计来查找帧之间的差异）

空间冗余利用了这样一个事实：通常颜色分布在图像中的频率往往相当低。我的意思是相邻像素往往具有相似的颜色。例如，在您穿着红色毛衣的图像中，代表您毛衣的所有像素都将具有非常相似的颜色。可以使用 DCT 将像素值变换到频率空间，其中一些低频信息可以被丢弃。然后，当执行反向 DCT 时（在解码期间），图像现在没有被丢弃的低频信息。

要查看丢弃此信息的效果，请打开 MS Paint 并绘制一系列重叠的水平和垂直黑线。将图像保存为 JPEG（也使用 DCT 进行压缩）。现在放大图案，注意线条的边缘不再那么锐利并且有点模糊。这是因为一些信息（从黑色到白色的过渡）在压缩过程中被丢弃。阅读此获取带有精美图片的说明

如需进一步阅读，这本书相当不错，虽然数学有点重。

In general, video compression is concerned with throwing away as much information as possible whilst having a minimal effect on the viewing experience for an end user. For example, using subsampled YUV instead of RGB cuts the video size in half straight away. This is possible as the human eye is less sensitive to colour than it is to brightness. In YUV, the Y value is brightness, and the U and V values represent colour. Therefore, you can throw away some of the colour information which reduces the file size, without the viewer noticing any difference.

After that, most compression techniques take advantage of 2 redundancies in particular. The first is temporal redundancy and the second is spatial redundancy.

Temporal redundancy notes that successive frames in a video sequence are very similar. Typically a video would be in the order of 20-30 frames per second, and nothing much changes in 1/30 of a second. Take any DVD and pause it, then move it on one frame and note how similar the 2 images are. So, instead of encoding each frame independently, MPEG-4 (and other compression standards) only encode the difference between successive frames (using motion estimation to find the difference between frames)

Spatial redundancy takes advantage of the fact that in general the colour spread across images tends to be quite low frequency. By this I mean that neighbouring pixels tend to have similar colours. For example, in an image of you wearing a red jumper, all of the pixels that represent your jumper would have very similar colour. It is possible to use the DCT to transform the pixel values into the frequency space, where some low frequency information can be thrown away. Then, when the reverse DCT is performed (during decoding), the image is now without the thrown away low-frequency information.

To view the effects of throwing away this information, open MS paint and draw a series of overlapping horizontal and vertical black lines. Save the image as a JPEG (which also uses DCT for compression). Now zoom in on the pattern, notice how the edges of the lines are not as sharp anymore and are kinda blurry. This is because some information (the transition from black to white) has been thrown away during compression. Read this for an explanation with nice pictures

For further reading, this book is quite good, if a little heavy on the maths.

回复收藏 0 原文

瞳孔里扚悲伤 2024-09-02 06:59:52

通过研究 MPEG4 解码器，我对 MPEG4 格式有了很多了解。对于视频和音频，有许多不同的参考（及其开源实现）编码器和解码器。因此，请阅读书籍 - 从维基百科开始：它有很好的一般摘要和可供遵循的链接（如果您幸运地“开放规范”）。然后点击源头。

有很多不同的编码方式（许多方式涉及某种形式的压缩，无论是有损还是无损），并且整个问题通常由于还必须处理帧容器和“子格式”而变得更加复杂。

玩得开心。

狄拉克：http://diracvideo.org/specifications/
MPEG-4：http://en.wikipedia.org/wiki/MPEG-4
JPEG：http://jpeg.org/public/jfif.pdf

回复收藏 0 原文

她如夕阳 2024-09-02 06:59:52

尝试从此处开始：

Windows Media Encoder 入门

http://www. microsoft.com/windows/windowsmedia/howto/articles/introencoding.aspx

更多数据请访问 codecpage.com

回复收藏 0 原文

~没有更多了~

关于作者

兮子

暂无简介

0 文章

0 评论

20 人气

关注发私信

友情链接

文江博客

从哪里开始学习音频或视频编解码器？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

ni139999

Smile

木子李

仅此而已

qq_2gSKZM

内心激荡

友情链接

从哪里开始学习音频或视频编解码器？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

ni139999

Smile

木子李

仅此而已

qq_2gSKZM

内心激荡

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。