Python中最佳比率的压缩?
Python 中哪种压缩方法的压缩率最好?
常用的 zlib.compress() 是最好的还是有更好的选择?我需要获得尽可能最佳的压缩比。
我正在压缩字符串并通过 UDP 发送它们。我压缩的典型字符串大约有 1,700,000 字节。
Which compression method in Python has the best compression ratio?
Is the commonly used zlib.compress()
the best or are there some better options? I need to get the best compression ratio possible.
I am compresing strings and sending them over UDP. A typical string I compress has about 1,700,000 bytes.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我确信可能有一些更晦涩的格式具有更好的压缩效果,但 lzma 是得到良好支持的格式中最好的。 此处有一些 Python 绑定。
编辑
不要在未经测试的情况下选择格式,某些算法根据数据集会做得更好。
I'm sure that there might be some more obscure formats with better compression, but lzma is the best, of those that are well supported. There are some python bindings here.
EDIT
Don't pick a format without testing, some algorithms do better depending on the data set.
如果您愿意牺牲性能来换取 getter 压缩,那么 bz2 库通常会比 gz (zlib) 库提供更好的结果。
还有其他压缩库,例如 xz (LZMA2),可能会提供更好的结果,但它们似乎并不包含在 python 的核心发行版中。
BZ2 类的 Python 文档
编辑:< /strong> 根据图像类型,您可能不会获得太多额外的压缩。许多图像格式之前都经过压缩,除非是 raw、bmp 或未压缩的 tiff。强烈建议在各种压缩类型之间进行测试。
EDIT2:如果您决定进行图像压缩。 Image Magick 支持 python 绑定和许多图像转换类型。
图像魔法
支持的图像格式
If you are willing to trade performance for getter compression then the bz2 library usually gives better results than the gz (zlib) library.
There are other compression libraries like xz (LZMA2) that might give even better results but they do not appear to be in the core distribution of python.
Python Doc for BZ2 class
EDIT: Depending on the type of image you might not get much additional compression. Many image formats are previously compressed unless it is raw, bmp, or uncompressed tiff. Testing between various compression types would be highly recommended.
EDIT2: If you do decide to do image compression. Image Magick supports python bindings and many image conversion types.
Image Magick
Image Formats Supported
最好的压缩算法绝对取决于您正在处理的数据类型。除非您正在使用存储为字符串的随机数列表(在这种情况下没有压缩算法将起作用),了解数据类型通常允许应用比通用算法更好的算法(请参阅其他答案以准备好使用通用压缩算法)。
如果您正在处理图像,您绝对应该选择一种有损压缩格式(即:像素感知),最好选择任何无损压缩格式。这会给你带来更好的结果。使用无损格式重新压缩而不是有损格式会浪费时间。
我会搜索 PIL 看看我可以使用什么。在发送之前将图像转换为 jpeg 并使其压缩比与研究的质量兼容之类的操作应该非常有效。
如果使用 UDP,您还应该非常小心,它可能会丢失一些数据包,并且大多数压缩格式对文件丢失部分非常敏感。好的。这可以在应用程序级别进行管理。
The best compression algorithm definitely depends of the kind of data you are dealing with. Unless if you are working with a list of random numbers stored as a string (in which case no compression algorithm will work) knowing the kind of data usually allows to apply much better algorithms than general purpose ones (see other answers for good ready to use general compression algorithms).
If you are dealing with images you should definitely choose a lossy compression format (ie: pixel aware) preferably to any lossless one. That will give you much better results. Recompressing with a lossless format over a lossy one is a loss of time.
I would search through PIL to see what I can use. Something like converting image to jpeg with a compression ratio compatible with researched quality before sending should be very efficient.
You should also be very cautious if using UDP, it can lose some packets, and most compression format are very sensible to missing parts of file. OK. That can be managed at application level.