python中的zlib解压
好的,我有一些由 python (2.6) zlib.compress() 函数压缩的数据流。当我尝试解压缩它们时,其中一些无法解压缩(zlib 错误 -5,这似乎是“缓冲区错误”,不知道该怎么做)。起初,我以为我已经完成了,但我意识到所有我无法解压的都是从 0x78DA 开始的(工作的是 0x789C),我环顾四周,这似乎是另一种 zlib 压缩——幻数根据所使用的压缩而变化。我可以用什么来解压文件?我被浇了吗?
Okay so I have some data streams compressed by python's (2.6) zlib.compress() function. When I try to decompress them, some of them won't decompress (zlib error -5, which seems to be a "buffer error", no idea what to make of that). At first, I thought I was done, but I realized that all the ones I couldn't decompress started with 0x78DA (the working ones were 0x789C), and I looked around and it seems to be a different kind of zlib compression -- the magic number changes depending on the compression used. What can I use to decompress the files? Am I hosed?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
发布评论
评论(4)
我正在寻找
python -c 'import sys,zlib;sys.stdout.write(zlib.decompress(sys.stdin.read()))'
自己写的;基于python中的zlib解压的答案
好吧,抱歉我不够清楚。这是win32,python 2.6.2。恐怕我找不到 zlib 文件,但它包含在 win32 二进制版本中。我无法访问原始数据——我一直在压缩我的日志文件,我想把它们找回来。至于其他软件,我天真地尝试过7zip,但当然失败了,因为它是zlib,而不是gzip(我无法使用任何软件直接解压缩zlib流)。我现在无法提供回溯的副本,但它是(回溯到 zlib.decompress(data))zlib.error:错误:-3。另外,需要明确的是,这些是静态文件,而不是我之前所说的流(因此没有传输错误)。我再次担心我没有代码,但我知道我使用了 zlib.compress(data, 9) (即在最高压缩级别 - 尽管有趣的是,似乎并非所有 zlib 输出都是 78da你可能会想到,因为我把它放在最高级别)并且只是 zlib.decompress()。
好吧,对我的上一篇文章感到抱歉,我没有拥有一切。我无法编辑我的帖子,因为我没有使用 OpenID。无论如何,这里有一些数据:
1)解压回溯:
Traceback (most recent call last):
File "<my file>", line 5, in <module>
zlib.decompress(data)
zlib.error: Error -5 while decompressing data
2)压缩代码:
#here you can assume the data is the data to be compressed/stored
data = encrypt(zlib.compress(data,9)) #a short wrapper around PyCrypto AES encryption
f = open("somefile", 'wb')
f.write(data)
f.close()
3)解压代码:
f = open("somefile", 'rb')
data = f.read()
f.close()
zlib.decompress(decrypt(data)) #this yeilds the error in (1)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
根据 RFC 1950 ,“OK”0x789C 和“bad”0x78DA 之间的区别位于 FLEVEL 位字段中:
“OK”使用 2,“bad”使用 3。因此,这种差异本身并不是问题。
为了进一步了解,您可以考虑为每次压缩和(尝试)解压缩提供以下信息:什么平台、Python 版本、zlib 库版本、用于调用 zlib 模块的实际代码是什么。还提供失败的解压尝试的完整回溯和错误消息。您是否尝试过使用其他 zlib 读取软件解压失败的文件?结果如何?请澄清您必须处理的问题:“我被浇水了吗?”意味着您无权访问原始数据?它是如何从流到文件的?您如何保证数据在传输过程中不会受到损坏?
更新基于您的自我回答中发布的部分澄清的一些观察:
您正在使用 Windows。 Windows在读写文件时区分二进制模式和文本模式。在文本模式下读取时,Python 2.x 将 '\r\n' 更改为 '\n',写入时将 '\n' 更改为 '\r\n'。在处理非文本数据时,这不是一个好主意。更糟糕的是,在文本模式下读取时,“\x1a”(又名 Ctrl-Z)被视为文件结尾。
压缩文件:
解压缩文件:
旁白:最好使用 gzip 模块,这样您就不必考虑像文本模式这样的麻烦事,但代价是额外的标头信息需要几个字节。
如果您在压缩代码中使用了“rb”和“wb”,但在解压缩代码中没有使用[不太可能?],那么您就没有被淘汰,您只需要充实上面的解压缩代码并继续使用即可。
请仔细注意以下未经测试的想法中“可能”、“应该”等的使用。
如果您没有在压缩代码中使用“rb”和“wb”,那么您被淹没的可能性相当高。
如果您的原始文件中存在“\x1a”的任何实例,则第一个实例之后的所有数据都会丢失 - 但在这种情况下,解压时不应失败(IOW,这种情况与您的症状不符)。
如果 Ctrl-Z 是由 zlib 本身生成的,则在尝试解压缩时会导致早期 EOF,这当然会导致异常。在这种情况下,您可以通过以二进制模式读取压缩文件,然后用 '\n' 替换 '\r\n' 来小心地反转该过程 [即模拟文本模式,无需 Ctrl-Z -> EOF 噱头]。解压结果。 编辑以文本模式写出结果。 结束编辑
更新2 我可以使用以下脚本重现您的症状 - 任何级别 1 到 9:
注意:您将需要使用一个相当大的文本文件(我使用了80kb的源文件)以确保解压结果将包含'\x1a'。
我可以使用以下脚本进行恢复:
注意:如果原始文件中有一个 '\x1a' 又名 Ctrl-Z 字节,并且该文件以文本模式读取,则该字节和所有后续字节将不会包含在压缩文件,因此无法恢复。对于文本文件(例如源代码)来说,这根本没有损失。对于二进制文件,您很可能会被淘汰。
更新 3 [根据最新消息,问题涉及加密/解密层]:
“错误 -5”消息表明您尝试解压缩的数据自压缩以来已被破坏。如果不是由于在文件上使用文本模式引起的,那么怀疑显然(?)落在您的解密和加密包装器上。如果您需要帮助,您需要透露这些包装器的来源。事实上,您应该尝试做的是(就像我所做的那样)编写一个小脚本,在多个输入文件上重现问题。其次(就像我一样)看看你是否可以在什么条件下逆转这个过程。如果您需要第二阶段的帮助,您需要透露问题重现脚本。
According to RFC 1950 , the difference between the "OK" 0x789C and the "bad" 0x78DA is in the FLEVEL bit-field:
"OK" uses 2, "bad" uses 3. So that difference in itself is not a problem.
To get any further, you might consider supplying the following information for each of compressing and (attempted) decompressing: what platform, what version of Python, what version of the zlib library, what was the actual code used to call the zlib module. Also supply the full traceback and error message from the failing decompression attempts. Have you tried to decompress the failing files with any other zlib-reading software? With what results? Please clarify what you have to work with: Does "Am I hosed?" mean that you don't have access to the original data? How did it get from a stream to a file? What guarantee do you have that the data was not mangled in transmission?
UPDATE Some observations based on partial clarifications published in your self-answer:
You are using Windows. Windows distinguishes between binary mode and text mode when reading and writing files. When reading in text mode, Python 2.x changes '\r\n' to '\n', and changes '\n' to '\r\n' when writing. This is not a good idea when dealing with non-text data. Worse, when reading in text mode, '\x1a' aka Ctrl-Z is treated as end-of-file.
To compress a file:
To decompress a file:
Aside: Better to use the gzip module which saves you having to think about nasssties like text mode, at the cost of a few bytes for the extra header info.
If you have been using 'rb' and 'wb' in your compression code but not in your decompression code [unlikely?], you are not hosed, you just need to flesh out the above decompression code and go for it.
Note carefully the use of "may", "should", etc in the following untested ideas.
If you have not been using 'rb' and 'wb' in your compression code, the probability that you have hosed yourself is rather high.
If there were any instances of '\x1a' in your original file, any data after the first such is lost -- but in that case it shouldn't fail on decompression (IOW this scenario doesn't match your symptoms).
If a Ctrl-Z was generated by zlib itself, this should cause an early EOF upon attempted decompression, which should of course cause an exception. In this case you may be able to gingerly reverse the process by reading the compressed file in binary mode and then substitute '\r\n' with '\n' [i.e. simulate text mode without the Ctrl-Z -> EOF gimmick]. Decompress the result. Edit Write the result out in TEXT mode. End edit
UPDATE 2 I can reproduce your symptoms -- with ANY level 1 to 9 -- with the following script:
Note: you will need a use a reasonably large text file (I used an 80kb source file) to ensure that the decompression result will contain a '\x1a'.
I can recover with this script:
NOTE: If there is a '\x1a' aka Ctrl-Z byte in the original file, and the file is read in text mode, that byte and all following bytes will NOT be included in the compressed file, and thus can NOT be recovered. For a text file (e.g. source code), this is no loss at all. For a binary file, you are most likely hosed.
Update 3 [following late revelation that there's an encryption/decryption layer involved in the problem]:
The "Error -5" message indicates that the data that you are trying to decompress has been mangled since it was compressed. If it's not caused by using text mode on the files, suspicion obviously(?) falls on your decryption and encryption wrappers. If you want help, you need to divulge the source of those wrappers. In fact what you should try to do is (like I did) put together a small script that reproduces the problem on more than one input file. Secondly (like I did) see whether you can reverse the process under what conditions. If you want help with the second stage, you need to divulge the problem-reproduction script.