Okay so I have some data streams compressed by python's (2.6) zlib.compress() function. When I try to decompress them, some of them won't decompress (zlib error -5, which seems to be a "buffer error", no idea what to make of that). At first, I thought I was done, but I realized that all the ones I couldn't decompress started with 0x78DA (the working ones were 0x789C), and I looked around and it seems to be a different kind of zlib compression -- the magic number changes depending on the compression used. What can I use to decompress the files? Am I hosed?

  FLEVEL (Compression level)
     These flags are available for use by specific compression
     methods.  The "deflate" method (CM = 8) sets these flags as

        0 - compressor used fastest algorithm
        1 - compressor used fast algorithm
        2 - compressor used default algorithm
        3 - compressor used maximum compression, slowest algorithm

     The information in FLEVEL is not needed for decompression; it
     is there to indicate if recompression might be worthwhile.

“OK”使用 2,“bad”使用 3。因此,这种差异本身并不是问题。

为了进一步了解,您可以考虑为每次压缩和(尝试)解压缩提供以下信息:什么平台、Python 版本、zlib 库版本、用于调用 zlib 模块的实际代码是什么。还提供失败的解压尝试的完整回溯和错误消息。您是否尝试过使用其他 zlib 读取软件解压失败的文件?结果如何?请澄清您必须处理的问题:“我被浇水了吗?”意味着您无权访问原始数据?它是如何从流到文件的?您如何保证数据在传输过程中不会受到损坏?


您正在使用 Windows。 Windows在读写文件时区分二进制模式和文本模式。在文本模式下读取时,Python 2.x 将 '\r\n' 更改为 '\n',写入时将 '\n' 更改为 '\r\n'。在处理非文本数据时,这不是一个好主意。更糟糕的是,在文本模式下读取时,“\x1a”(又名 Ctrl-Z)被视为文件结尾。


# imports and other superstructure left as a exercise
str_object1 = open('my_log_file', 'rb').read()
str_object2 = zlib.compress(str_object1, 9)
f = open('compressed_file', 'wb')


str_object1 = open('compressed_file', 'rb').read()
str_object2 = zlib.decompress(str_object1)
f = open('my_recovered_log_file', 'wb')

旁白:最好使用 gzip 模块,这样您就不必考虑像文本模式这样的麻烦事,但代价是额外的标头信息需要几个字节。




如果您的原始文件中存在“\x1a”的任何实例,则第一个实例之后的所有数据都会丢失 - 但在这种情况下,解压时不应失败(IOW,这种情况与您的症状不符)。

如果 Ctrl-Z 是由 zlib 本身生成的,则在尝试解压缩时会导致早期 EOF,这当然会导致异常。在这种情况下,您可以通过以二进制模式读取压缩文件,然后用 '\n' 替换 '\r\n' 来小心地反转该过程 [即模拟文本模式,无需 Ctrl-Z -> EOF 噱头]。解压结果。 编辑以文本模式写出结果。 结束编辑

更新2 我可以使用以下脚本重现您的症状 - 任何级别 1 到 9:

import zlib, sys
fn = sys.argv[1]
level = int(sys.argv[2])
s1 = open(fn).read() # TEXT mode
s2 = zlib.compress(s1, level)
f = open(fn + '-ct', 'w') # TEXT mode
# try to decompress in text mode
s1 = open(fn + '-ct').read() # TEXT mode
s2 = zlib.decompress(s1) # error -5
f = open(fn + '-dtt', 'w')



import zlib, sys
fn = sys.argv[1]
# (1) reverse the text-mode write
# can't use text-mode read as it will stop at Ctrl-Z
s1 = open(fn, 'rb').read() # BINARY mode
s1 = s1.replace('\r\n', '\n')
# (2) reverse the compression
s2 = zlib.decompress(s1)
# (3) reverse the text mode read
f = open(fn + '-fixed', 'w') # TEXT mode

注意:如果原始文件中有一个 '\x1a' 又名 Ctrl-Z 字节,并且该文件以文本模式读取,则该字节和所有后续字节将不会包含在压缩文件,因此无法恢复。对于文本文件(例如源代码)来说,这根本没有损失。对于二进制文件,您很可能会被淘汰。

更新 3 [根据最新消息,问题涉及加密/解密层]:

“错误 -5”消息表明您尝试解压缩的数据自压缩以来已被破坏。如果不是由于在文件上使用文本模式引起的,那么怀疑显然(?)落在您的解密和加密包装器上。如果您需要帮助,您需要透露这些包装器的来源。事实上,您应该尝试做的是(就像我所做的那样)编写一个小脚本,在多个输入文件上重现问题。其次(就像我一样)看看你是否可以在什么条件下逆转这个过程。如果您需要第二阶段的帮助,您需要透露问题重现脚本。

python -c 'import sys,zlib;sys.stdout.write(zlib.decompress('


I was looking for

python -c 'import sys,zlib;sys.stdout.write(zlib.decompress('

wrote it myself; based on answers of zlib decompression in python

Okay sorry I wasn't clear enough. This is win32, python 2.6.2. I'm afraid I can't find the zlib file, but its whatever is included in the win32 binary release. And I don't have access to the original data -- I've been compressing my log files, and I'd like to get them back. As far as other software, I've naievely tried 7zip, but of course it failed, because it's zlib, not gzip (I couldn't any software to decompress zlib streams directly). I can't give a carbon copy of the traceback now, but it was (traced back to zlib.decompress(data)) zlib.error: Error: -3. Also, to be clear, these are static files, not streams as I made it sound earlier (so no transmission errors). And I'm afraid again I don't have the code, but I know I used zlib.compress(data, 9) (i.e. at the highest compression level -- although, interestingly it seems that not all the zlib output is 78da as you might expect since I put it on the highest level) and just zlib.decompress().

好吧,对我的上一篇文章感到抱歉,我没有拥有一切。我无法编辑我的帖子,因为我没有使用 OpenID。无论如何,这里有一些数据:


Traceback (most recent call last):
  File "<my file>", line 5, in <module>
zlib.error: Error -5 while decompressing data


#here you can assume the data is the data to be compressed/stored
data = encrypt(zlib.compress(data,9)) #a short wrapper around PyCrypto AES encryption
f = open("somefile", 'wb')


f = open("somefile", 'rb')
data =

zlib.decompress(decrypt(data)) #this yeilds the error in (1)

