zlib.error:解压缩时错误-3:标头检查不正确

发布于 2024-09-06 22:21:21 字数 331 浏览 7 评论 0原文

我有一个 gzip 文件,我尝试通过 Python 读取它,如下所示:

import zlib

do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)

它抛出此错误:

zlib.error: Error -3 while decompressing: incorrect header check

我该如何克服它?

I have a gzip file and I am trying to read it via Python as below:

import zlib

do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)

it throws this error:

zlib.error: Error -3 while decompressing: incorrect header check

How can I overcome it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

我是有多爱你 2024-09-13 22:21:21

您遇到此错误:

zlib.error: Error -3 while decompressing: incorrect header check

这很可能是因为您试图检查不存在的标头,例如您的数据遵循 RFC 1951deflate 压缩格式)而不是 RFC 1950zlib 压缩格式)或 RFC 1952gzip 压缩格式)。

选择 windowBits

zlib 可以解压缩所有这些格式:

  • 要(解)压缩 deflate 格式,请使用 wbits = -zlib.MAX_WBITS
  • 来(解压缩) -)压缩zlib格式,使用wbits = zlib.MAX_WBITS
  • 来(解)压缩gzip格式,使用wbits = zlib。最大_WBITS | 16

请参阅 http://www.zlib.net/manual.html#Advanced 中的文档inflateInit2 部分)

示例

测试数据:

>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>> 
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>> 

针对 zlib 的明显测试:

>>> zlib.decompress(zlib_data)
'test'

针对 deflate 的测试:

>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'

针对 gzip 的测试code>:

>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'

数据也与 gzip 模块兼容:

>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data)  # io.BytesIO for Python 3
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()

自动标头检测(zlib 或 gzip)

32 添加到 windowBits 将触发标头

>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'

检测而是使用 gzip

或者忽略 zlib 并直接使用 gzip 模块;但请记住,在幕后gzip< /code> 使用 zlib

fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()

You have this error:

zlib.error: Error -3 while decompressing: incorrect header check

Which is most likely because you are trying to check headers that are not there, e.g. your data follows RFC 1951 (deflate compressed format) rather than RFC 1950 (zlib compressed format) or RFC 1952 (gzip compressed format).

choosing windowBits

But zlib can decompress all those formats:

  • to (de-)compress deflate format, use wbits = -zlib.MAX_WBITS
  • to (de-)compress zlib format, use wbits = zlib.MAX_WBITS
  • to (de-)compress gzip format, use wbits = zlib.MAX_WBITS | 16

See documentation in http://www.zlib.net/manual.html#Advanced (section inflateInit2)

examples

test data:

>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS)
>>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS)
>>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16)
>>> 
>>> text = '''test'''
>>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush()
>>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush()
>>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush()
>>> 

obvious test for zlib:

>>> zlib.decompress(zlib_data)
'test'

test for deflate:

>>> zlib.decompress(deflate_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(deflate_data, -zlib.MAX_WBITS)
'test'

test for gzip:

>>> zlib.decompress(gzip_data)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zlib.error: Error -3 while decompressing data: incorrect header check
>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|16)
'test'

the data is also compatible with gzip module:

>>> import gzip
>>> import StringIO
>>> fio = StringIO.StringIO(gzip_data)  # io.BytesIO for Python 3
>>> f = gzip.GzipFile(fileobj=fio)
>>> f.read()
'test'
>>> f.close()

automatic header detection (zlib or gzip)

adding 32 to windowBits will trigger header detection

>>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32)
'test'
>>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32)
'test'

using gzip instead

or you can ignore zlib and use gzip module directly; but please remember that under the hood, gzip uses zlib.

fh = gzip.open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
悲凉≈ 2024-09-13 22:21:21

更新dnozay的答案解释了问题,应该是公认的答案。


尝试使用 gzip 模块,下面的代码直接来自 Python 文档

import gzip
f = gzip.open('/home/joe/file.txt.gz', 'rb')
file_content = f.read()
f.close()

Update: dnozay's answer explains the problem and should be the accepted answer.


Try the gzip module, code below is straight from the python docs.

import gzip
f = gzip.open('/home/joe/file.txt.gz', 'rb')
file_content = f.read()
f.close()
安静被遗忘 2024-09-13 22:21:21

我刚刚解决了解压缩 gzip 数据时的“错误的标头检查”问题。

您需要设置 -WindowBits => WANT_GZIP 在您对 inflateInit2 的调用中(使用 2 版本)

是的,这可能非常令人沮丧。对文档的典型浅读将 Zlib 视为 Gzip 压缩的 API,但默认情况下(不使用 gz* 方法)它不会创建或解压缩 Gzip 格式。您必须发送这个非非常显着的记录标志。

I just solved the "incorrect header check" problem when uncompressing gzipped data.

You need to set -WindowBits => WANT_GZIP in your call to inflateInit2 (use the 2 version)

Yes, this can be very frustrating. A typically shallow reading of the documentation presents Zlib as an API to Gzip compression, but by default (not using the gz* methods) it does not create or uncompress the Gzip format. You have to send this non-very-prominently documented flag.

毁我热情 2024-09-13 22:21:21

这并没有回答原来的问题,但它可能会帮助到这里的其他人。

zlib.error: Error -3 while decompressing: invalid header check 也出现在下面的示例中:

b64_encoded_bytes = base64.b64encode(zlib.compress(b'abcde'))
encoded_bytes_representation = str(b64_encoded_bytes)  # this the cause
zlib.decompress(base64.b64decode(encoded_bytes_representation))

该示例是我在某些遗留 Django 代码中遇到的情况的最小再现,其中 Base64 编码字节(来自 HTTP POST)存储在 Django CharField (而不是 BinaryField)。

从数据库读取 CharField 值时,会对该值调用 str()无需显式编码 ,如 Django 源代码

str() 文档 说:

如果既没有给出编码也没有给出错误,str(object)返回object.str(),这是对象的“非正式”或很好打印的字符串表示形式。对于字符串对象,这是字符串本身。如果 object 没有 str() 方法,则 str() 会回退到返回 repr(object)。

因此,在示例中,我们无意中对

"b'eJxLTEpOSQUABcgB8A=='"

而不是

b'eJxLTEpOSQUABcgB8A==' 进行了 Base64 解码。

如果使用显式编码,例如str(b64_encoded_bytes, 'utf-8'),则示例中的zlib解压缩将会成功。

Django 特有的注意事项:

特别棘手的是:此问题仅在从数据库中检索值时才会出现。例如,请参阅下面的测试,该测试通过(在 Django 3.0.3 中):

class MyModelTests(TestCase):
    def test_bytes(self):
        my_model = MyModel.objects.create(data=b'abcde')
        self.assertIsInstance(my_model.data, bytes)  # issue does not arise
        my_model.refresh_from_db()
        self.assertIsInstance(my_model.data, str)  # issue does arise

其中 MyModel

class MyModel(models.Model):
    data = models.CharField(max_length=100)

This does not answer the original question, but it may help someone else that ends up here.

The zlib.error: Error -3 while decompressing: incorrect header check also occurs in the example below:

b64_encoded_bytes = base64.b64encode(zlib.compress(b'abcde'))
encoded_bytes_representation = str(b64_encoded_bytes)  # this the cause
zlib.decompress(base64.b64decode(encoded_bytes_representation))

The example is a minimal reproduction of something I encountered in some legacy Django code, where Base64 encoded bytes (from an HTTP POST) were being stored in a Django CharField (instead of a BinaryField).

When reading a CharField value from the database, str() is called on the value, without an explicit encoding, as can be seen in the Django source.

The str() documentation says:

If neither encoding nor errors is given, str(object) returns object.str(), which is the “informal” or nicely printable string representation of object. For string objects, this is the string itself. If object does not have a str() method, then str() falls back to returning repr(object).

So, in the example, we are inadvertently base64-decoding

"b'eJxLTEpOSQUABcgB8A=='"

instead of

b'eJxLTEpOSQUABcgB8A=='.

The zlib decompression in the example would succeed if an explicit encoding were used, e.g. str(b64_encoded_bytes, 'utf-8').

NOTE specific to Django:

What's especially tricky: this issue only arises when retrieving a value from the database. See for example the test below, which passes (in Django 3.0.3):

class MyModelTests(TestCase):
    def test_bytes(self):
        my_model = MyModel.objects.create(data=b'abcde')
        self.assertIsInstance(my_model.data, bytes)  # issue does not arise
        my_model.refresh_from_db()
        self.assertIsInstance(my_model.data, str)  # issue does arise

where MyModel is

class MyModel(models.Model):
    data = models.CharField(max_length=100)
不离久伴 2024-09-13 22:21:21

要解压缩内存中不完整的 gzip 字节,dnozay 的回答很有用,但它错过了 zlib.decompressobj<我发现 /code> 调用是必要的:

incomplete_decompressed_content = zlib.decompressobj(wbits=zlib.MAX_WBITS | 16).decompress(incomplete_gzipped_content)

请注意 zlib.MAX_WBITS | 16 是 15 | 16 即 31。有关 wbits 的一些背景信息,请参阅 zlib.decompress


信用:Yann Vernier 的回答,其中记录了 zlib.decompressobj 调用。

To decompress incomplete gzipped bytes that are in memory, the answer by dnozay is useful but it misses the zlib.decompressobj call which I found to be necessary:

incomplete_decompressed_content = zlib.decompressobj(wbits=zlib.MAX_WBITS | 16).decompress(incomplete_gzipped_content)

Note that zlib.MAX_WBITS | 16 is 15 | 16 which is 31. For some background about wbits, see zlib.decompress.


Credit: answer by Yann Vernier which notes the the zlib.decompressobj call.

⒈起吃苦の倖褔 2024-09-13 22:21:21

有趣的是,我在尝试使用 Python 使用 Stack Overflow API 时遇到了这个错误。

我设法让它与 gzip 目录中的 GzipFile 对象一起工作,大致如下:

import gzip

gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb'))

file_contents = gzip_file.read()

Funnily enough, I had that error when trying to work with the Stack Overflow API using Python.

I managed to get it working with the GzipFile object from the gzip directory, roughly like this:

import gzip

gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb'))

file_contents = gzip_file.read()
情释 2024-09-13 22:21:21

我的案例是解压缩存储在 Bullhorn 数据库中的电子邮件。片段如下:

import pyodbc
import zlib

cn = pyodbc.connect('connection string')
cursor = cn.cursor()
cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ')



 for msg in cursor.fetchall():
    #magic in the second parameter, use negative value for deflate format
    decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS)

My case was to decompress email messages that are stored in Bullhorn database. The snippet is the following:

import pyodbc
import zlib

cn = pyodbc.connect('connection string')
cursor = cn.cursor()
cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ')



 for msg in cursor.fetchall():
    #magic in the second parameter, use negative value for deflate format
    decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS)
疑心病 2024-09-13 22:21:21

如果您使用 Node.js ,请尝试 fflate 包,它对我的​​ gzip 有用。

const fflate = require('fflate');

    const decompressedData = await new Promise((resolve, reject) => {
           fflate.gunzip(buffer, (error, result) => {
                       if (error) {
                       reject(error);
                       } else {
                       resolve(result);
                      }
                   });
                });
           xml = Buffer.from(decompressedData).toString('UTF-8');

If you use Node.js , try fflate package, that worked for me for gzip.

const fflate = require('fflate');

    const decompressedData = await new Promise((resolve, reject) => {
           fflate.gunzip(buffer, (error, result) => {
                       if (error) {
                       reject(error);
                       } else {
                       resolve(result);
                      }
                   });
                });
           xml = Buffer.from(decompressedData).toString('UTF-8');

多情癖 2024-09-13 22:21:21

只需添加标头 'Accept-Encoding': 'identity'

import requests

requests.get('http://gett.bike/', headers={'Accept-Encoding': 'identity'})

https://github.com/requests/请求/问题/3849

Just add headers 'Accept-Encoding': 'identity'

import requests

requests.get('http://gett.bike/', headers={'Accept-Encoding': 'identity'})

https://github.com/requests/requests/issues/3849

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文