如何在Python 2.5中模拟ZipFile.open？

发布于 2024-09-24 12:48:09 字数 3344 浏览 8 评论 0原文

我想将文件从 zip 中提取到特定路径，忽略存档中的文件路径。这在 Python 2.6 中非常容易（我的文档字符串比代码长）

import shutil
import zipfile

def extract_from_zip(name, dest_path, zip_file):
    """Similar to zipfile.ZipFile.extract but extracts the file given by name
    from the zip_file (instance of zipfile.ZipFile) to the given dest_path
    *ignoring* the filename path given in the archive completely
    instead of preserving it as extract does.
    """
    dest_file = open(dest_path, 'wb')
    archived_file = zip_file.open(name)
    shutil.copyfileobj(archived_file, dest_file)


 extract_from_zip('path/to/file.dat', 'output.txt', zipfile.ZipFile('test.zip', 'r'))

但是在 Python 2.5 中， ZipFile.open 方法不可用。我在 stackoverflow 上找不到解决方案，但是这个论坛帖子有一个很好的解决方案，利用< code>ZipInfo.file_offset 寻找 zip 中的正确位置并使用 zlib.decompressobj 从那里解压字节。不幸的是 ZipInfo.file_offset 在 Python 2.5 中被删除了！

因此，考虑到 Python 2.5 中我们拥有的只是 ZipInfo。 header_offset，我想我只需要解析并跳过标头结构即可自己获取文件偏移量。使用维基百科作为参考（我知道）我想出了这么多更长而且不是很优雅的解决方案。

import zipfile
import zlib

def extract_from_zip(name, dest_path, zip_file):
    """Python 2.5 version :("""
    dest_file = open(dest_path, 'wb')
    info = zip_file.getinfo(name)
    if info.compress_type == zipfile.ZIP_STORED:
        decoder = None
    elif info.compress_type == zipfile.ZIP_DEFLATED:
        decoder = zlib.decompressobj(-zlib.MAX_WBITS)
    else:
        raise zipfile.BadZipFile("Unrecognized compression method")

    # Seek over the fixed size fields to the "file name length" field in
    # the file header (26 bytes). Unpack this and the "extra field length"
    # field ourselves as info.extra doesn't seem to be the correct length.
    zip_file.fp.seek(info.header_offset + 26)
    file_name_len, extra_len = struct.unpack("<HH", zip_file.fp.read(4))
    zip_file.fp.seek(info.header_offset + 30 + file_name_len + extra_len)

    bytes_to_read = info.compress_size

    while True:
        buff = zip_file.fp.read(min(bytes_to_read, 102400))
        if not buff:
            break
        bytes_to_read -= len(buff)
        if decoder:
            buff = decoder.decompress(buff)
        dest_file.write(buff)

    if decoder:
        dest_file.write(decoder.decompress('Z'))
        dest_file.write(decoder.flush())

请注意我如何解压并读取给出额外字段长度的字段，因为在 ZipInfo.extra 属性上调用 len 会减少 4 个字节，从而导致偏移量为会被错误地计算。也许我在这里遗漏了一些东西？

有人可以针对 Python 2.5 改进这个解决方案吗？

编辑：我应该说，

dest_file.write(zip_file.read(name))

对于 zip 中包含的任何大小合理的文件，ChrisAdams 建议的明显解决方案将会失败，并出现 MemoryError ，因为它试图将整个文件放入一口气记忆。我有大文件，所以我需要将内容流式传输到磁盘。

另外，升级 Python 是一种显而易见的解决方案，但它完全超出了我的控制范围，而且基本上是不可能的。

原文

I want to extract a file from a zip to a specific path, ignoring the file path in the archive. This is very easy in Python 2.6 (my docstring is longer than the code)

import shutil
import zipfile

def extract_from_zip(name, dest_path, zip_file):
    """Similar to zipfile.ZipFile.extract but extracts the file given by name
    from the zip_file (instance of zipfile.ZipFile) to the given dest_path
    *ignoring* the filename path given in the archive completely
    instead of preserving it as extract does.
    """
    dest_file = open(dest_path, 'wb')
    archived_file = zip_file.open(name)
    shutil.copyfileobj(archived_file, dest_file)


 extract_from_zip('path/to/file.dat', 'output.txt', zipfile.ZipFile('test.zip', 'r'))

But in Python 2.5, The ZipFile.open method is not available. I couldn't find a solution on stackoverflow, but this forum post had a nice solution that makes use of the ZipInfo.file_offset to seek to the right point in the zip and use zlib.decompressobj to unpack the bytes from there. Unfortunately ZipInfo.file_offset was removed in Python 2.5!

So, given that all we have in Python 2.5 is the ZipInfo.header_offset, I figured I'd just have to parse and skip over the header structure to get to the file offset myself. Using Wikipedia as a reference (I know) I came up with this much longer and not very elegant solution.

import zipfile
import zlib

def extract_from_zip(name, dest_path, zip_file):
    """Python 2.5 version :("""
    dest_file = open(dest_path, 'wb')
    info = zip_file.getinfo(name)
    if info.compress_type == zipfile.ZIP_STORED:
        decoder = None
    elif info.compress_type == zipfile.ZIP_DEFLATED:
        decoder = zlib.decompressobj(-zlib.MAX_WBITS)
    else:
        raise zipfile.BadZipFile("Unrecognized compression method")

    # Seek over the fixed size fields to the "file name length" field in
    # the file header (26 bytes). Unpack this and the "extra field length"
    # field ourselves as info.extra doesn't seem to be the correct length.
    zip_file.fp.seek(info.header_offset + 26)
    file_name_len, extra_len = struct.unpack("<HH", zip_file.fp.read(4))
    zip_file.fp.seek(info.header_offset + 30 + file_name_len + extra_len)

    bytes_to_read = info.compress_size

    while True:
        buff = zip_file.fp.read(min(bytes_to_read, 102400))
        if not buff:
            break
        bytes_to_read -= len(buff)
        if decoder:
            buff = decoder.decompress(buff)
        dest_file.write(buff)

    if decoder:
        dest_file.write(decoder.decompress('Z'))
        dest_file.write(decoder.flush())

Note how I unpack and read the field that gives the length of the extra field, because calling len on the ZipInfo.extra attribute gives 4 bytes less, thus causing the offset to be calculated incorrectly. Perhaps I'm missing something here?

Can anyone improve on this solution for Python 2.5?

Edit: I should have said, the obvious solution as suggested by ChrisAdams

dest_file.write(zip_file.read(name))

will fail with MemoryError for any reasonably sized file contained in the zip because it tries to slurp the whole file into memory in one go. I have large files, so I need to stream out the contents to disk.

Also, upgrading Python is the obvious solution, but one that is entirely out of my hands and essentially impossible.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

じ违心 2024-10-01 12:48:09

还没有测试过这一点，但我在 Python 2.4 中使用了非常相似的东西

import zipfile

def extract_from_zip(name, dest_path, zip_file):
    dest_file = open(dest_path, 'wb')
    dest_file.write(zip_file.read(name))
    dest_file.close()

extract_from_zip('path/to/file/in/archive.dat', 
        'output.txt', 
        zipfile.ZipFile('test.zip', 'r'))

Haven't tested this bit, but I use something extremely similar in Python 2.4

import zipfile

def extract_from_zip(name, dest_path, zip_file):
    dest_file = open(dest_path, 'wb')
    dest_file.write(zip_file.read(name))
    dest_file.close()

extract_from_zip('path/to/file/in/archive.dat', 
        'output.txt', 
        zipfile.ZipFile('test.zip', 'r'))

回复收藏 0 原文

咽泪装欢 2024-10-01 12:48:09

我知道我回答这个问题有点晚了，但我也遇到了完全相同的问题。

我使用的解决方案是复制 python 2.6.6 版本的 zipfile 并放入一个文件夹（我称之为 python_fix）并导入它：

python_fix/zipfile.py

然后在代码中：

import python_fix.zipfile as zipfile

从那里我可以使用 2.6.6 版本的 zipfile python 2.5.1解释器（2.7.X版本在这个版本的“with”上失败”）

希望这可以帮助其他使用古老技术的人。

I know I am a bit late to the party for this question, but was having the exact same problem.

The solution I used was to copy the python 2.6.6 version of zipfile and put in a folder (I called it python_fix) and import that instead:

python_fix/zipfile.py

Then in code:

import python_fix.zipfile as zipfile

From there I was able to use the 2.6.6 version of zipfile with the python 2.5.1 interpreter (the 2.7.X versions fail on the "with" with this version")

Hope this helps someone else using ancient technology.

回复收藏 0 原文