使用 ZipFile 模块从 zipfile 中删除文件
我从 zip 文件中删除文件的唯一方法是创建一个没有要删除的文件的临时 zip 文件,然后将其重命名为原始文件名。
在 python 2.4 中,ZipInfo 类有一个属性 file_offset
,因此可以创建第二个 zip 文件并将数据复制到其他文件,而无需解压缩/重新压缩。
python 2.6 中缺少这个 file_offset
,那么除了通过解压缩每个文件然后再次重新压缩来创建另一个 zipfile 之外,还有其他选择吗?
有没有直接删除zip文件中文件的方法,我搜索过但没有找到任何东西。
The only way I came up for deleting a file from a zipfile was to create a temporary zipfile without the file to be deleted and then rename it to the original filename.
In python 2.4 the ZipInfo class had an attribute file_offset
, so it was possible to create a second zip file and copy the data to other file without decompress/recompressing.
This file_offset
is missing in python 2.6, so is there another option than creating another zipfile by uncompressing every file and then recompressing it again?
Is there maybe a direct way of deleting a file in the zipfile, I searched and didn't find anything.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
以下代码片段对我有用(从 Zip 存档中删除所有 *.exe 文件):
如果将所有内容读入内存,则可以消除对第二个文件的需要。 然而,这个片段重新压缩了所有内容。
仔细检查后,
ZipInfo.header_offset
是距文件开头的偏移量。 该名称具有误导性,但主 Zip 标头实际上存储在文件末尾。 我的十六进制编辑器证实了这一点。因此,您将遇到的问题如下:您还需要删除主标头中的目录条目,否则它将指向一个不再存在的文件。 如果您也保留要删除的文件的本地标头,则保持主标头完好无损可能会起作用,但我不确定这一点。 你是如何用旧模块做到这一点的?
在不修改主标头的情况下,当我打开它时,我收到错误“zipfile 中缺少 X 字节”。 这可能会帮助您了解如何修改主标头。
The following snippet worked for me (deletes all *.exe files from a Zip archive):
If you read everything into memory, you can eliminate the need for a second file. However, this snippet recompresses everything.
After closer inspection the
ZipInfo.header_offset
is the offset from the file start. The name is misleading, but the main Zip header is actually stored at the end of the file. My hex editor confirms this.So the problem you'll run into is the following: You need to delete the directory entry in the main header as well or it will point to a file that doesn't exist anymore. Leaving the main header intact might work if you keep the local header of the file you're deleting as well, but I'm not sure about that. How did you do it with the old module?
Without modifying the main header I get an error "missing X bytes in zipfile" when I open it. This might help you to find out how to modify the main header.
不是很优雅,但这就是我的做法:
Not very elegant but this is how I did it:
基于 Elias Zamaria 对问题的评论。
阅读完 Python-Issue #51067 后,我想提供有关它的更新。
目前,解决方案已经存在,但由于缺少作者的贡献者协议而未得到 Python 的批准。
不过,您可以从 https://github.com/python 获取代码/cpython/blob/659eb048cc9cac73c46349eb29845bc5cd630f09/Lib/zipfile.py 并从中创建一个单独的文件。 之后,只需从项目中引用它,而不是内置 python 库:
import myproject.zipfile as zipfile
。用法:
我相信它将包含在未来的Python版本中。 对我来说,它对于给定的用例来说就像一个魅力。
Based on Elias Zamaria comment to the question.
Having read through Python-Issue #51067, I want to give update regarding it.
For today, solution already exists, though it is not approved by Python due to missing Contributor Agreement from the author.
Nevertheless, you can take the code from https://github.com/python/cpython/blob/659eb048cc9cac73c46349eb29845bc5cd630f09/Lib/zipfile.py and create a separate file from it. After that just reference it from your project instead of built-in python library:
import myproject.zipfile as zipfile
.Usage:
I believe it will be included in future python versions. For me it works like a charm for given use case.
中的例程
1 允许您根据 ZIP 中的完整路径或基于 (delete_from_zip_file
ruamel.std.zipfilere
) 模式删除文件。 例如,您可以使用以下命令删除test.zip
中的所有.exe
文件(请注意
*
之前的点)。这与 mdm 的解决方案类似(包括需要重新压缩),但会在内存中重新创建 ZIP 文件(使用类
InMemZipFile()
),完全覆盖旧文件后读。1 免责声明:我是该包的作者。
The routine
delete_from_zip_file
fromruamel.std.zipfile
¹ allows you to delete a file based on its full path within the ZIP, or based on (re
) patterns. E.g. you can delete all of the.exe
files fromtest.zip
using(please note the dot before the
*
).This works similar to mdm's solution (including the need for recompression), but recreates the ZIP file in memory (using the class
InMemZipFile()
), overwriting the old file after it is fully read.¹ Disclaimer: I am the author of that package.
TL;DR:
解释:
我错误地添加了多个同名文件,而且它们都接近0字节。 @mdm 建议的方法在这里不起作用。 这是因为,如果您将文件名 (
str
) 传递给read
方法,它会为您提供最后一项 - 至少看起来是这样。 但是,在阅读CPython代码中的库文档之后,这部分将变得显而易见:通过传递
zip_info
(一个ZipInfo
对象),您可以确保您将检索到该确切的文件。TL;DR:
Explanation:
I added multiple files with the same name by mistake, and all of them were nearly 0 byte. The method suggested by @mdm won't work here. This is because if you pass the filename (
str
) to theread
method, it gives you the last item - at least, it seems that way. However, after reading the library doc in CPython code, this part will become apparent:By passing
zip_info
(aZipInfo
object), you can be sure that you will retrieve that exact file.