DCT 压缩 - 块大小、选择系数

发布于 2024-08-15 16:37:41 字数 546 浏览 9 评论 0原文

我试图了解块大小的影响以及在 DCT 压缩中选择系数的最佳策略。 基本上我想问我在这里写的内容:

视频压缩:什么是离散余弦变换?

让我们假设最原始的压缩。制作图像块。对每个博客执行 DCT 并将一些系数归零。

据我了解,块越小越好。 较小的块意味着像素更加相关,因此 DCT 频谱中的能量更加“紧凑”。在快速变化的图像(高频)中应该更加强调它。

假设我们将一定百分比的系数归零,什么会产生最佳图像质量,小块还是大块? 假设我们保留 10%、25%、50%、75%,你会说对于不同的百分比会有不同的答案吗?

另一个问题是如何选择保持不变的系数。 免得说我必须根据位置而不是能量做出决定。 你会从左上角取一个正方形吗? 我对 DCT 频谱中的许多块进行了平均,得出的结论是最好从左上角取一个三角形。你怎么认为?

希望我们能进行有效的讨论。

I'm trying to understand the effect of the Block Size and best strategy of choosing the Coefficients in DCT compression.
Basically I want to ask what I wrote here:

Video Compression: What is discrete cosine transform?

Lets assume the most primitive compression. Making block of an image. Performing a DCT on each blog and zeroing out some coefficients.

To my understanding, the smaller the block the better.
Smaller blocks means the Pixels are more correlated hence the energy in the DCT spectrum is more "Compact". It should be more emphasized in a fast varying images (High Frequency).

Let's say we zero out a certain percent of the coefficients, what would result in best image quality, small or large blocks?
Let's say we keep, 10%, 25%, 50%, 75%, would you say it's a different answer for a different percentage?

Another issue is how to chose the coefficients you leave untouched.
Lest's say I have to make a decision based on location and not energy.
Would you take a square from the top left corner?
I've averaged many block in the DCT spectrum and concluded the best would be taking a triangle from the top left corner. What do you think?

Hopefully we'll have effective discussion.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

奶气 2024-08-22 16:37:41

你的问题的本质似乎是关于图像质量。关于这个主题已经有大量文献发表,其结果是图像质量很难确定。

信噪比 (SNR) 和均方误差 (MSE) 等标准数学误差测量可以给出定量答案,但众所周知,这些与观众的主观意见并没有很好的相关性,这必须是我们的主观意见。最终权力。没有其他方法,即使是基于观看者心理视觉模型的方法(例如,SA Karunasekera 和 NG Kingsbury,“基于人类视觉敏感度的图像中块伪影的失真测量”,IEEE Trans. on Image Proc. vol. 4) ,第 6 期,1995 年 6 月,第 713 –724 页;以及 M. Miyahara、K. Kotani 和 VR Algazi,“图像编码的客观图像质量量表 (PQS)”,IEEE Trans,第 46 卷。第 9 期,1998 年 9 月,第 1215 –1226 页),已证明自己比 SNR 更好。

此外,当您改变图像类型(线条画、卡通、照片、肖像等)时,某些类型的压缩失真会变得更加明显。在一幅图像中,蚊子噪声可能令人反感,而在另一幅图像中,楼梯噪声可能是罪魁祸首。

简而言之,对于您的问题“什么会产生最佳图像质量?”没有明确的答案。

话虽这么说,我们可以说一些有关 DCT 的相关内容。块的 DCT 中的像素从左上角以锯齿形图案从低变化到高变化 [(0,0)->(0,1)->(1,0)-> ;(2,0)->(1,1)->(0,2)->etc.],作为三角形选择镜像。一个像素距离左上角越近,其中包含的信息就越平滑[实际上,(0,0) DCT值是整个块的平均值],距离该角越远,获得的信息就越平滑。您将获得“高频”详细信息。越靠近图像的顶部和左侧,DCT 系数表示的水平和垂直细节越多,越靠近块的对角线,拥有的对角线细节就越多。

简而言之,有损压缩通常需要丢弃一些肉眼无法察觉的“细节”。 (丢弃“更平滑”的 DCT 值会导致严重的失真。)丢弃的 DCT 值越多,压缩比就越大,但引起的失真也越大。

至于块大小,这完全取决于。块中的方差和细节越多,丢弃系数带来的损失就越大。一些压缩算法自适应地在同一图像内使用不同的块大小,以便高细节区域接收更多且更小的块,而平滑区域接收更少且更大的块。

对于使用单个块大小的算法,8x8、16x16 和 32x32 对于 JPEG 和 MPEG 等都很常见。压缩它们所需的处理将小于自适应块大小,但质量通常也会较低。

The essence of your question seems to be about image quality. There has been a considerable literature produced on the subject, and the result is that image quality is a hard thing to determine.

Standard mathematical error measures like the signal-to-noise ratio (SNR) and mean-squared error (MSE) can give a quantitative answer, but it is well known that these don’t correlate well with subjective viewer opinions, which must be our final authority. No other methods, even those founded on psycho-visual models of the viewer (e.g., S.A. Karunasekera and N.G. Kingsbury, “A distortion measure for blocking artifacts in images based on human visual sensitivity”, IEEE Trans. on Image Proc. vol. 4, no. 6, June 1995, pp. 713 –724; and M. Miyahara, K. Kotani, and V. R. Algazi, “Objective picture quality scale (PQS) for image coding,” IEEE Trans. on Comm. vol. 46, no. 9, Sept. 1998, pp. 1215 –1226), have proven themselves to be better than SNR.

Moreover, when you vary the type of imagery (line drawing, cartoon, photo, portrait, etc.), certain types of compression distortion become more evident. Mosquito noise might be objectionable in one image, while staircase noise might be the culprit in another.

In short, there is no pat answer to your question, "what would result in best image quality?"

That being said, we can say some things about the DCT that are of relevance. The pixels in a DCT of a block go from low variation to high variation in a zig-zag pattern from the top left corner [(0,0)->(0,1)->(1,0)->(2,0)->(1,1)->(0,2)->etc.], as your triangle selection mirrors. The closer a pixel is to the top left corner, the smoother the information contained therein [in fact, the (0,0) DCT value is the average of the whole block], and the farther away from that corner you get, the more "high frequency" details you'll get. The closer to the top and left of the image, the more horizontal and vertical details you'll have represented by that DCT coefficient, and the closer to the diagonal of the block, the more diagonal details you'll have.

In brief, lossy compression usually entails throwing away some of the "details" that may not be perceptible to the eye. (Throwing away the "smoother" DCT values results in severe distortion.) The more DCT values you throw away, the greater your compression ratio will be, but also the greater distortion you'll induce.

As for block size, it all depends. The more variance and detail there is in a block, the more you'll lose by throwing away coefficients. Some compression algorithms adaptively use different block sizes within the same image so that high-detail regions receive more and smaller blocks and smooth regions receive fewer and larger blocks.

For algorithms that use a single block size, 8x8, 16x16, and 32x32 are common for things like JPEG and MPEG. The processing required to compress them will be smaller than an adaptive block size, but the quality will also be lower in general.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文