通过网络发送高压缩文本文件
我有一个文本文件想要通过网络发送,该文件的大小可能从 1KB 到 500KB 不等。
在发送该文件之前,我可以使用哪些算法/技术来紧密压缩该文件,以便通过网络发送的字节数最少且压缩率较高?
I have a text file that I want to send over the network, this file could vary in size from as low as 1KB to 500KB.
What algorithms/techniques could I use to tightly compress this file before sending it such that the least amount of bytes are send over the network and compression ratio is high?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
对于压缩,我会考虑 gzip、bzip2 和 LZMA(这不是详尽的列表,但在我看来它们是最著名的)。
然后,我会在网上寻找一些基准,并尝试收集各种文件类型(文本、二进制、混合)和大小(小、大、巨大)的指标。即使您最感兴趣的是压缩比,您也可能需要查看:压缩比、压缩时间、内存占用、解压缩时间。
根据快速基准测试:Gzip、Bzip2 与 LZMA:
这在 LZMA - 优于 bzip2 中得到了证实:
因此,对于文本文件的压缩,同一站点报告:
最后,这是另一个带有图形结果的资源:压缩工具:lzma 、bzip2 和gzip
我真的建议执行您自己的工作台(因为您将仅压缩文本并且压缩非常小的文件)以获得环境中的真实指标,但我打赌
LZMA 不会在小文本文件上提供显着的优势,因此 bzip2 会是一个不错的选择(即使
LZMA
在小文件上的时间和内存开销可能较低)。如果您计划从 Java 执行压缩,您将找到一个
LZMA
实现 此处,bzip2 实现此处(来自 Apache Ant AFAIK),< code>gzip 包含在 JDK 中。如果您不想或不能依赖第三方库,请使用 gzip。For compression, I'd consider gzip, bzip2 and LZMA (this is not an exhaustive list but these are IMO the most famous).
Then, I'd look for some benchmarks on the net and try to gather metrics for various files type (text, binary, mixed) and size (small, big, huge). Even if you're mostly interested by compression ratio, you might want to look at: the compression ratio, the compression time, the memory footprint, the decompression time.
According to A Quick Benchmark: Gzip vs. Bzip2 vs. LZMA:
This is confirmed in LZMA - better than bzip2:
So, for the compression of text files, the same site reports:
Finally, here is another resource with graphical results: Compression Tools: lzma, bzip2 & gzip
I'd really recommend to perform your own bench (as you'll be compressing text only and very small to small files) to get real metrics in your environment, but my bet is that
LZMA
won't provide a significant advantage on small text files sobzip2
would be a decent choice (even if the time and memory overhead ofLZMA
might be low on small files).If you plan to perform the compression from Java, you'll find a
LZMA
implementation here, a bzip2 implementation here (coming from Apache Ant AFAIK),gzip
being included in the JDK. If you don't want to or can't rely on a third party library, use gzip.答案取决于内容。 GZip 包含在 jdk 中。对随机字符串的测试似乎平均减少了 33% 的大小。
[编辑:内容,而不是上下文]
The answer depends on the content. GZip is included in the jdk. Tests on random strings seem to average 33% reduction in size.
[edit: content, not context]
这取决于。你能控制网络数据包的大小吗?如果一个包中可以容纳超过 1 个,您是否会将它们捆绑在一起?您是否受到两端 CPU 的限制?这不是真正的问题,但仍然相关,因为压缩和压缩可能需要更长的时间。解压缩而不是有时发送字节。
It depends. Can you control the network packet size? Are you going to bundle them if more than 1 will fit in a packet? Are you limited by CPU on either end? Not really the question, but still related since it can take longer to compress & decompress than to send the bytes at times.