我们应该使用什么压缩格式?我们应该让 DEFLATE (.zip) 休息一下吗?
随着大多数 Linux 发行版放弃 gzip 和 bzip2,转而使用 LZMA2 来压缩其包,并且许多平台都有许多开源实现,我想知道:我们是否应该使用 DEFLATE 和 .zip
格式(不幸的是,一次又一次地被混蛋)休息,然后继续使用其他现代方式来分发我们的(源)包?
GNU tar 支持 J
开关,它使用 xz
(另一个 LZMA2 压缩器)作为过滤器:
$ tar cJf foo.tar.xz foo/
但是,我倾向于使用 7z
(p7zip 实现)它是Linux下用于创建档案的朋友7za
。在创建档案时,我仍然使用“避免焦油炸弹”范例,这意味着该档案中有一个目录,因此从命令行提取不会导致当前目录中的文件溢出(这是标准的操作方式 em> 在 Linux 上使用诸如 tar
之类的东西,但在 Windows 下似乎不太常见)。
无论如何,由于在软件包(例如 Fedora RPM 和 Ubuntu DEB)中的使用以及 tar
等工具的过滤器,LZMA2 似乎是继之后使用的“下一个最好的东西” bzip2。它具有很高的压缩率(在标准设置中远远超过 bzip2)并且速度也非常快(压缩比 gzip 稍慢,
我自己做了一些基准测试,但我想把注意力转向一些更广泛的方面)基准:
- 基于评级的基准位于 compression ratings.com
- 基于效率的基准位于 maximumcompression.com
现在,您会注意到 7-zip,这是参考实现,没有出现在第一位。但是 Freearc 使用它自己的 .arc
格式,这并不是真正的格式跨平台能力,并且与 80 年代的旧 ARC 不兼容,这有点低迷,但重要的是算法,而不是存档器。
无论如何,现在是性能 !使用 7-zip 及其衍生实现 (xz),不再是问题,压缩比为就其本身而言,我想将我的源代码包分发为 .7z
或 .tar.xz
存档。然而,我面前有两个障碍,我似乎无法跨过:
WinRAR 的拥护者。 不要误会我的意思,我对 WinRAR 或其用户没有怨恨,只是我无法真正在 Linux 上制作 RAR,而且也没有必要,因为我们有免费的 LZMA2 工具。正如我所说,自从成为发行包的组成部分以来,它可以在任何现代发行版上使用。由于制作
.7z
与制作.rar
所需的时间大致相同,而且 LZMA2 文件通常较小,因此我不明白为什么不使用 7-zip。< /p>tar 存档必须是 zip 或 bzip2,无一例外。 这是一个很难的问题。为什么这么多人对 gzip 印象深刻?即使 bzip2 大多数时候也没有太多使用。诚然,gzip 速度很快,当涉及到按需压缩(例如在 Web 服务器中)或创建大型镜像备份时,这是一个很好的优点。但是分发软件呢? LZMA2非常不对称。虽然压缩需要时间,但解压缩速度非常快。
好吧,现在我的问题来了:
既然 LZMA2 可以说是下一个更好的压缩算法,为什么人们不跳上火车呢?为什么人们仍然使用 WinRAR,它是专有的,压缩率较差,并且没有移植到 Linux(除了 unrar
,但你显然不能用它创建压缩包)。为什么 Tarball 仍然大部分是 gzip 压缩的?
有没有办法说服人们转向更新、可靠的归档格式,它不仅是跨平台的,而且是免费的?当我给某人一个以 .7z
结尾的文件时,他们往往不知道如何处理它,这会改变吗?
哦,这是我自己做的小基准。我在所有地方都使用了默认设置:(
11837440 GNUtar_TAR.tar
10657984 Arc_ARC.arc
9632524 PA2010_TAR_BZip2.tar.bz2
9536967 PA2010_LHA_Frozen5.lzh
9510148 PA2010_ZIP_BZip2.zipx
9490211 GNUtar_TAR.tar.bz2
9467242 PA2010_LHA_Frozen6.lzh
9463630 7-zip_ZIP_BZip2.zip
9437520 7-zip_7-ZIP_BZip2.7z
9398798 Arj_ARJ.arj
9373435 GNUtar_TAR.tar.gz
9370456 PA2010_BlackHole_Deflate.bh
9369621 Lha_LHA_Frozen6.lzh
9367712 PA2010_ZIP_Deflate.zip
9364237 PA2010_TAR_gzip.tar.gz
9360248 PA2010_Cabinet_MsZip.cab
9303923 7-zip_ZIP_Deflate.zip
9215279 7-zip_ZIP_Deflate64.zip
9189365 PA2010_ZIP_PPMd.zipx
9060663 PA2010_7-ZIP_PPMd.7z
8931280 PA2010_Cabinet_LZX.cab
8847427 7-zip_7-ZIP_PPMd.7z
8803350 PA2010_ZIP_Optimized.zipx
8803350 PA2010_ZIP_Wavpack.zipx
8802850 PA2010_ZIP_LZMA.zipx
5812491 FreeArc_7-ZIP.arc
5789853 7-zip_7-ZIP_LZMA.7z
5789853 PA2010_7-ZIP_LZMA.7z
5789024 GNUtar_TAR.tar.xz
5782637 FreeArc_UHARC.arc
5770969 FreeArc_CCM.arc
5739697 Fp8_5.fp8
5718865 Fp8_8.fp8
5685234 Paq8px_5.paq8px
5677662 Paq8kx_5.paq8kx
5644422 Paq8px_8.paq8px
5609608 Paq8kx_8.paq8kx
大小以字节为单位;文件名:Archiver_Format_Algorithm.Extension
)
填充集由包含 DOS 安装的磁盘映像组成:
1474979 disk01.144
1474979 disk02.144
1474979 disk03.144
1474979 disk04.144
1474979 disk05.144
1474979 ldisk01.144
1474979 ldisk02.144
1474979 ldisk03.144
24325 diskcopy.com
(大小(以字节为单位)
With most Linux distributions dropping gzip and bzip2 in favor of LZMA2 for compressing their packages, and many open source implementations for many platforms, I wonder: Shouldn't we lay DEFLATE and the .zip
format (which unfortunately got bastardized over and over) to rest, and move on to other, modern ways of distributing our (source) packages?
GNU tar supports the J
switch, which uses xz
(another LZMA2 compressor) as filter:
$ tar cJf foo.tar.xz foo/
However, I tend to use 7z
(p7zip implementation) and it's friend 7za
under Linux for creating archives. I still use the "avoid tar-bombs" paradigm, when creating archives, meaning there's a directory in that archives, so extracting from commandline does not result in spilling out files in the current directory (this is standard modus operandi on Linux with things like tar
, but it seems to be much less of a commen thing to do under Windows).
Anyways, it seems due to the use in packages (Fedora RPMs and Ubuntu DEBs, for instance), as well as filters for tools like tar
, that LZMA2 is the "next best thing" coming to use after bzip2. It has a great compression rate (beats bzip2 by far in standard settings) and is very fast at it, too (compression is slightly slower than gzip,
I did some benchmarking myself, but I'd like to turn the spot on some more extensive Benchmarks:
- Rating based benchmark at compressionratings.com
- Efficiency based benchmark at maximumcompression.com
Now, you'll notice, that 7-zip, which is the reference implementation, does not appear on first place. However Freearc uses it's own .arc
format, which is not really cross platform capable and not compatible to the old ARC from the 80's. nanozip isn't open source, which kind of a downturn, but it's the algorithm that counts, not the archiver!
Anyways, now that performance with 7-zip and its derivative implementations (xz), is not an issue any more, and the compression ratio is speaking for itself, I feel like distributing my source packages as .7z
or .tar.xz
archives. However, I have two hurdles in front of me, which I don't seem able to take:
Advocates of WinRAR.
Dont' get me wrong, I hold no grudge against WinRAR or its users, it's just that I can't really make RARs on Linux, and there's no need to, since we have free LZMA2 tools. And as I said, since becoming an integral part of distribution packages, it's available on any modern Distribution. Since it takes about the same time to make a.7z
than a.rar
and LZMA2 files are generally smaller, I don't see why not use 7-zip.tar archives have to be zip or bzip2, no exceptions.
This is a hard one. Why are so many people impressed with gzip? Even bzip2 doesn't see much usage most of the time. Granted, gzip is fast, a good point when it comes to on-demand compression such as in web servers, or when creating large mirror-backups. But what about distributing software? LZMA2 is very asymmetrical. While compression takes its time, decompression is blazingly fast.
OK, now here comes my question:
Since LZMA2 is arguably the next better compression algorithm, why are people not jumping onto the train? Why do people still use WinRAR, which is proprietary, has a worse compression ratio, and is not ported to Linux (except unrar
, but you obviously can't create archives with that). Why are Tarballs still mostly gziped?
Is there no way on how to convince people to move on to a newer, reliable archiving format, that's not only cross-platform, but also free? When I give someone a file ending in .7z
, they tend not to know what to do with it, will this ever change?
Oh, and here's the little benchmark I did myself. I used the default settings everywhere:
11837440 GNUtar_TAR.tar
10657984 Arc_ARC.arc
9632524 PA2010_TAR_BZip2.tar.bz2
9536967 PA2010_LHA_Frozen5.lzh
9510148 PA2010_ZIP_BZip2.zipx
9490211 GNUtar_TAR.tar.bz2
9467242 PA2010_LHA_Frozen6.lzh
9463630 7-zip_ZIP_BZip2.zip
9437520 7-zip_7-ZIP_BZip2.7z
9398798 Arj_ARJ.arj
9373435 GNUtar_TAR.tar.gz
9370456 PA2010_BlackHole_Deflate.bh
9369621 Lha_LHA_Frozen6.lzh
9367712 PA2010_ZIP_Deflate.zip
9364237 PA2010_TAR_gzip.tar.gz
9360248 PA2010_Cabinet_MsZip.cab
9303923 7-zip_ZIP_Deflate.zip
9215279 7-zip_ZIP_Deflate64.zip
9189365 PA2010_ZIP_PPMd.zipx
9060663 PA2010_7-ZIP_PPMd.7z
8931280 PA2010_Cabinet_LZX.cab
8847427 7-zip_7-ZIP_PPMd.7z
8803350 PA2010_ZIP_Optimized.zipx
8803350 PA2010_ZIP_Wavpack.zipx
8802850 PA2010_ZIP_LZMA.zipx
5812491 FreeArc_7-ZIP.arc
5789853 7-zip_7-ZIP_LZMA.7z
5789853 PA2010_7-ZIP_LZMA.7z
5789024 GNUtar_TAR.tar.xz
5782637 FreeArc_UHARC.arc
5770969 FreeArc_CCM.arc
5739697 Fp8_5.fp8
5718865 Fp8_8.fp8
5685234 Paq8px_5.paq8px
5677662 Paq8kx_5.paq8kx
5644422 Paq8px_8.paq8px
5609608 Paq8kx_8.paq8kx
(Size in Bytes; Filename: Archiver_Format_Algorithm.Extension
)
The set of filles consists of disk images which contain a DOS installation:
1474979 disk01.144
1474979 disk02.144
1474979 disk03.144
1474979 disk04.144
1474979 disk05.144
1474979 ldisk01.144
1474979 ldisk02.144
1474979 ldisk03.144
24325 diskcopy.com
(Size in Bytes)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
过去 bz2 不是 tarball 的选项。然后有人向 GNU Tar 添加了一个选项来创建和读取 bz2 档案,很快这种格式就开始传播。所以答案是:
如果您相信 LZMA,那么向自由软件基金会提交补丁(连同所有适当的文书工作),您将使世界变得更加美好。
Used to be that bz2 wasn't an option for tarballs. Then someone added an option to GNU Tar to create and read bz2 archives, and pretty soon the format began to spread. So the answer is:
If you believe in LZMA then submit patches to the Free Software Foundation (with all appropriate paperwork) and you'll make the world that much of a better place.