什么类型的文件会导致正确的 LZW 压缩实现失败?
我正在使用一个简单且正确的 LZW 压缩程序,并且我发现对于某些 .gif 和 .bmp 文件,该程序会输出更大的原始文件。谁能解释一下是什么因素会导致这个结果?
我认为原始文件太随机,但我不确定如何显示这一点。
I'm using a simple and correct LZW compression program, and I've found that for some .gif and .bmp files the program outputs a larger file for the original. Could anyone explain what factors would cause this outcome?
I assume that the original file is too random, but I'm not exactly sure how to show this.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这并不是失败。如果程序不是无损的,那就是失败的。
从数学上讲,如果某些序列被无损压缩,那么必然存在一些无损扩展的序列。
至于为什么要扩展特定序列而不是压缩序列,原因很简单,即该序列没有足够的冗余供 LZW 利用。与更现代的算法相比,LZW 是一种过时且相对无效的压缩算法。 (其中“更现代”可能意味着 30 岁而不是 40 岁。)
此外,LZW 是不可压缩数据扩展多少方面最差的之一。您可以考虑在结果前面添加一位来指示后续数据是否被压缩。那么你就不会受到LZW膨胀率的影响。
您可以使用其他压缩算法(例如在 gzip 和 xz 中实现的算法)来了解数据本身的可压缩性。
That is not a failure. It would be a failure if the program were not lossless.
It is mathematically necessary that if some sequences are compressed losslessly, then there must exist some sequences that are expanded losslessly.
As to why your particular sequence is expanded as opposed to compressed, it is simply that that sequence does not have sufficient redundancy for LZW to take advantage of. LZW is an obsolete and relatively ineffective algorithm for compression, as compared to more modern algorithms. (Where "more modern" might mean 30 years old instead of 40 years old.)
Also LZW is one of the worst at how much incompressible data is expanded. You might consider prepending your result with one bit that indicates whether the subsequent data is compressed or not. Then you won't get the hit of LZW's expansion ratio.
You can use other compression algorithms, e.g. those implemented in gzip and xz, to get a feel for how inherently compressible your data is.