zlib.error:解压缩时错误-3:标头检查不正确
我有一个 gzip 文件,我尝试通过 Python 读取它,如下所示:
import zlib
do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)
它抛出此错误:
zlib.error: Error -3 while decompressing: incorrect header check
我该如何克服它?
I have a gzip file and I am trying to read it via Python as below:
import zlib
do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)
it throws this error:
zlib.error: Error -3 while decompressing: incorrect header check
How can I overcome it?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
您遇到此错误:
这很可能是因为您试图检查不存在的标头,例如您的数据遵循
RFC 1951
(deflate
压缩格式)而不是RFC 1950
(zlib
压缩格式)或RFC 1952
(gzip
压缩格式)。选择 windowBits
但
zlib
可以解压缩所有这些格式:deflate
格式,请使用wbits = -zlib.MAX_WBITS
zlib
格式,使用wbits = zlib.MAX_WBITS
gzip
格式,使用wbits = zlib。最大_WBITS | 16
请参阅 http://www.zlib.net/manual.html#Advanced 中的文档(
inflateInit2
部分)示例
测试数据:
针对
zlib
的明显测试:针对
deflate
的测试:针对
gzip
的测试code>:数据也与
gzip
模块兼容:自动标头检测(zlib 或 gzip)
将
32
添加到windowBits
将触发标头检测而是使用
gzip
或者忽略
zlib
并直接使用gzip
模块;但请记住,在幕后,gzip< /code> 使用
zlib
。You have this error:
Which is most likely because you are trying to check headers that are not there, e.g. your data follows
RFC 1951
(deflate
compressed format) rather thanRFC 1950
(zlib
compressed format) orRFC 1952
(gzip
compressed format).choosing windowBits
But
zlib
can decompress all those formats:deflate
format, usewbits = -zlib.MAX_WBITS
zlib
format, usewbits = zlib.MAX_WBITS
gzip
format, usewbits = zlib.MAX_WBITS | 16
See documentation in http://www.zlib.net/manual.html#Advanced (section
inflateInit2
)examples
test data:
obvious test for
zlib
:test for
deflate
:test for
gzip
:the data is also compatible with
gzip
module:automatic header detection (zlib or gzip)
adding
32
towindowBits
will trigger header detectionusing
gzip
insteador you can ignore
zlib
and usegzip
module directly; but please remember that under the hood,gzip
useszlib
.更新:dnozay的答案解释了问题,应该是公认的答案。
尝试使用
gzip
模块,下面的代码直接来自 Python 文档。Update: dnozay's answer explains the problem and should be the accepted answer.
Try the
gzip
module, code below is straight from the python docs.我刚刚解决了解压缩 gzip 数据时的“错误的标头检查”问题。
您需要设置 -WindowBits => WANT_GZIP 在您对 inflateInit2 的调用中(使用 2 版本)
是的,这可能非常令人沮丧。对文档的典型浅读将 Zlib 视为 Gzip 压缩的 API,但默认情况下(不使用 gz* 方法)它不会创建或解压缩 Gzip 格式。您必须发送这个非非常显着的记录标志。
I just solved the "incorrect header check" problem when uncompressing gzipped data.
You need to set -WindowBits => WANT_GZIP in your call to inflateInit2 (use the 2 version)
Yes, this can be very frustrating. A typically shallow reading of the documentation presents Zlib as an API to Gzip compression, but by default (not using the gz* methods) it does not create or uncompress the Gzip format. You have to send this non-very-prominently documented flag.
这并没有回答原来的问题,但它可能会帮助到这里的其他人。
zlib.error: Error -3 while decompressing: invalid header check
也出现在下面的示例中:该示例是我在某些遗留 Django 代码中遇到的情况的最小再现,其中 Base64 编码字节(来自 HTTP POST)存储在 Django
CharField
(而不是BinaryField
)。从数据库读取
CharField
值时,会对该值调用str()
,无需显式编码
,如 Django 源代码。str()
文档 说:因此,在示例中,我们无意中对
"b'eJxLTEpOSQUABcgB8A=='"
而不是
b'eJxLTEpOSQUABcgB8A=='
进行了 Base64 解码。如果使用显式
编码
,例如str(b64_encoded_bytes, 'utf-8')
,则示例中的zlib
解压缩将会成功。Django 特有的注意事项:
特别棘手的是:此问题仅在从数据库中检索值时才会出现。例如,请参阅下面的测试,该测试通过(在 Django 3.0.3 中):
其中
MyModel
是This does not answer the original question, but it may help someone else that ends up here.
The
zlib.error: Error -3 while decompressing: incorrect header check
also occurs in the example below:The example is a minimal reproduction of something I encountered in some legacy Django code, where Base64 encoded bytes (from an HTTP POST) were being stored in a Django
CharField
(instead of aBinaryField
).When reading a
CharField
value from the database,str()
is called on the value, without an explicitencoding
, as can be seen in the Django source.The
str()
documentation says:So, in the example, we are inadvertently base64-decoding
"b'eJxLTEpOSQUABcgB8A=='"
instead of
b'eJxLTEpOSQUABcgB8A=='
.The
zlib
decompression in the example would succeed if an explicitencoding
were used, e.g.str(b64_encoded_bytes, 'utf-8')
.NOTE specific to Django:
What's especially tricky: this issue only arises when retrieving a value from the database. See for example the test below, which passes (in Django 3.0.3):
where
MyModel
is要解压缩内存中不完整的 gzip 字节,dnozay 的回答很有用,但它错过了
zlib.decompressobj<我发现 /code> 调用是必要的:
请注意
zlib.MAX_WBITS | 16 是 15 | 16
即 31。有关wbits
的一些背景信息,请参阅zlib.decompress
。信用:Yann Vernier 的回答,其中记录了
zlib.decompressobj
调用。To decompress incomplete gzipped bytes that are in memory, the answer by dnozay is useful but it misses the
zlib.decompressobj
call which I found to be necessary:Note that
zlib.MAX_WBITS | 16
is15 | 16
which is 31. For some background aboutwbits
, seezlib.decompress
.Credit: answer by Yann Vernier which notes the the
zlib.decompressobj
call.有趣的是,我在尝试使用 Python 使用 Stack Overflow API 时遇到了这个错误。
我设法让它与 gzip 目录中的 GzipFile 对象一起工作,大致如下:
Funnily enough, I had that error when trying to work with the Stack Overflow API using Python.
I managed to get it working with the
GzipFile
object from the gzip directory, roughly like this:我的案例是解压缩存储在 Bullhorn 数据库中的电子邮件。片段如下:
My case was to decompress email messages that are stored in Bullhorn database. The snippet is the following:
如果您使用
Node.js
,请尝试fflate
包,它对我的gzip
有用。If you use
Node.js
, tryfflate
package, that worked for me forgzip
.只需添加标头 'Accept-Encoding': 'identity'
https://github.com/requests/请求/问题/3849
Just add headers 'Accept-Encoding': 'identity'
https://github.com/requests/requests/issues/3849