使用 ZipFile 模块从 zipfile 中删除文件

发布于 2024-07-13 14:15:20 字数 315 浏览 18 评论 0原文

我从 zip 文件中删除文件的唯一方法是创建一个没有要删除的文件的临时 zip 文件,然后将其重命名为原始文件名。

在 python 2.4 中,ZipInfo 类有一个属性 file_offset,因此可以创建第二个 zip 文件并将数据复制到其他文件,而无需解压缩/重新压缩。

python 2.6 中缺少这个 file_offset ,那么除了通过解压缩每个文件然后再次重新压缩来创建另一个 zipfile 之外,还有其他选择吗?

有没有直接删除zip文件中文件的方法,我搜索过但没有找到任何东西。

The only way I came up for deleting a file from a zipfile was to create a temporary zipfile without the file to be deleted and then rename it to the original filename.

In python 2.4 the ZipInfo class had an attribute file_offset, so it was possible to create a second zip file and copy the data to other file without decompress/recompressing.

This file_offset is missing in python 2.6, so is there another option than creating another zipfile by uncompressing every file and then recompressing it again?

Is there maybe a direct way of deleting a file in the zipfile, I searched and didn't find anything.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

陈独秀 2024-07-20 14:15:20

以下代码片段对我有用(从 Zip 存档中删除所有 *.exe 文件):

zin = zipfile.ZipFile ('archive.zip', 'r')
zout = zipfile.ZipFile ('archve_new.zip', 'w')
for item in zin.infolist():
    buffer = zin.read(item.filename)
    if (item.filename[-4:] != '.exe'):
        zout.writestr(item, buffer)
zout.close()
zin.close()

如果将所有内容读入内存,则可以消除对第二个文件的需要。 然而,这个片段重新压缩了所有内容。

仔细检查后,ZipInfo.header_offset 是距文件开头的偏移量。 该名称具有误导性,但主 Zip 标头实际上存储在文件末尾。 我的十六进制编辑器证实了这一点。

因此,您将遇到的问题如下:您还需要删除主标头中的目录条目,否则它将指向一个不再存在的文件。 如果您也保留要删除的文件的本地标头,则保持主标头完好无损可能会起作用,但我不确定这一点。 你是如何用旧模块做到这一点的?

在不修改主标头的情况下,当我打开它时,我收到错误“zipfile 中缺少 X 字节”。 可能会帮助您了解如何修改主标头。

The following snippet worked for me (deletes all *.exe files from a Zip archive):

zin = zipfile.ZipFile ('archive.zip', 'r')
zout = zipfile.ZipFile ('archve_new.zip', 'w')
for item in zin.infolist():
    buffer = zin.read(item.filename)
    if (item.filename[-4:] != '.exe'):
        zout.writestr(item, buffer)
zout.close()
zin.close()

If you read everything into memory, you can eliminate the need for a second file. However, this snippet recompresses everything.

After closer inspection the ZipInfo.header_offset is the offset from the file start. The name is misleading, but the main Zip header is actually stored at the end of the file. My hex editor confirms this.

So the problem you'll run into is the following: You need to delete the directory entry in the main header as well or it will point to a file that doesn't exist anymore. Leaving the main header intact might work if you keep the local header of the file you're deleting as well, but I'm not sure about that. How did you do it with the old module?

Without modifying the main header I get an error "missing X bytes in zipfile" when I open it. This might help you to find out how to modify the main header.

(り薆情海 2024-07-20 14:15:20

不是很优雅,但这就是我的做法:

import subprocess
import zipfile

z = zipfile.ZipFile(zip_filename)

files_to_del = filter( lambda f: f.endswith('exe'), z.namelist()]

cmd=['zip', '-d', zip_filename] + files_to_del
subprocess.check_call(cmd)

# reload the modified archive
z = zipfile.ZipFile(zip_filename)

Not very elegant but this is how I did it:

import subprocess
import zipfile

z = zipfile.ZipFile(zip_filename)

files_to_del = filter( lambda f: f.endswith('exe'), z.namelist()]

cmd=['zip', '-d', zip_filename] + files_to_del
subprocess.check_call(cmd)

# reload the modified archive
z = zipfile.ZipFile(zip_filename)
冷︶言冷语的世界 2024-07-20 14:15:20

基于 Elias Zamaria 对问题的评论。

阅读完 Python-Issue #51067 后,我想提供有关它的更新。

目前,解决方案已经存在,但由于缺少作者的贡献者协议而未得到 Python 的批准。

不过,您可以从 https://github.com/python 获取代码/cpython/blob/659eb048cc9cac73c46349eb29845bc5cd630f09/Lib/zipfile.py 并从中创建一个单独的文件。 之后,只需从项目中引用它,而不是内置 python 库:import myproject.zipfile as zipfile

用法:

with zipfile.ZipFile(f"archive.zip", "a") as z:
    z.remove(f"firstfile.txt")

我相信它将包含在未来的Python版本中。 对我来说,它对于给定的用例来说就像一个魅力。

Based on Elias Zamaria comment to the question.

Having read through Python-Issue #51067, I want to give update regarding it.

For today, solution already exists, though it is not approved by Python due to missing Contributor Agreement from the author.

Nevertheless, you can take the code from https://github.com/python/cpython/blob/659eb048cc9cac73c46349eb29845bc5cd630f09/Lib/zipfile.py and create a separate file from it. After that just reference it from your project instead of built-in python library: import myproject.zipfile as zipfile.

Usage:

with zipfile.ZipFile(f"archive.zip", "a") as z:
    z.remove(f"firstfile.txt")

I believe it will be included in future python versions. For me it works like a charm for given use case.

反差帅 2024-07-20 14:15:20

中的例程 delete_from_zip_file ruamel.std.zipfile1 允许您根据 ZIP 中的完整路径或基于 (re) 模式删除文件。 例如,您可以使用以下命令删除 test.zip 中的所有 .exe 文件

from ruamel.std.zipfile import delete_from_zip_file

delete_from_zip_file('test.zip', pattern='.*.exe')  

(请注意 * 之前的点)。

这与 mdm 的解决方案类似(包括需要重新压缩),但会在内存中重新创建 ZIP 文件(使用类 InMemZipFile()),完全覆盖旧文件后读。


1 免责声明:我是该包的作者。

The routine delete_from_zip_file from ruamel.std.zipfile¹ allows you to delete a file based on its full path within the ZIP, or based on (re) patterns. E.g. you can delete all of the .exe files from test.zip using

from ruamel.std.zipfile import delete_from_zip_file

delete_from_zip_file('test.zip', pattern='.*.exe')  

(please note the dot before the *).

This works similar to mdm's solution (including the need for recompression), but recreates the ZIP file in memory (using the class InMemZipFile()), overwriting the old file after it is fully read.


¹ Disclaimer: I am the author of that package.

執念 2024-07-20 14:15:20

TL;DR:

import zipfile

with zipfile.ZipFile("bad.zip") as bad:
    # Or use "a" instead of "w" if you're appending
    with zipfile.ZipFile("good", "w") as good:
        for zip_info in bad.infolist():
            # I had hundreds of duplications of 'sample_100.csv'
            not_a_bad_file = zip_info.filename != 'sample_33.csv' or zip_info.file_size > 146622
            if not_a_bad_file:
                good.writestr(zip_info, bad.read(zip_info))

解释:

我错误地添加了多个同名文件,而且它们都接近0字节。 @mdm 建议的方法在这里不起作用。 这是因为,如果您将文件名 (str) 传递给 read 方法,它会为您提供最后一项 - 至少看起来是这样。 但是,在阅读CPython代码中的库文档之后,这部分将变得显而易见:

.. note::

      The :meth:`.open`, :meth:`read` and :meth:`extract` methods can take a filename
      or a :class:`ZipInfo` object.  You will appreciate this when trying to read a
      ZIP file that contains members with duplicate names.

通过传递 zip_info (一个 ZipInfo 对象),您可以确保您将检索到该确切的文件。

TL;DR:

import zipfile

with zipfile.ZipFile("bad.zip") as bad:
    # Or use "a" instead of "w" if you're appending
    with zipfile.ZipFile("good", "w") as good:
        for zip_info in bad.infolist():
            # I had hundreds of duplications of 'sample_100.csv'
            not_a_bad_file = zip_info.filename != 'sample_33.csv' or zip_info.file_size > 146622
            if not_a_bad_file:
                good.writestr(zip_info, bad.read(zip_info))

Explanation:

I added multiple files with the same name by mistake, and all of them were nearly 0 byte. The method suggested by @mdm won't work here. This is because if you pass the filename (str) to the read method, it gives you the last item - at least, it seems that way. However, after reading the library doc in CPython code, this part will become apparent:

.. note::

      The :meth:`.open`, :meth:`read` and :meth:`extract` methods can take a filename
      or a :class:`ZipInfo` object.  You will appreciate this when trying to read a
      ZIP file that contains members with duplicate names.

By passing zip_info (a ZipInfo object), you can be sure that you will retrieve that exact file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文