用python解压目录结构

发布于 2024-07-15 12:16:46 字数 315 浏览 5 评论 0原文

我有一个 zip 文件,其中包含以下目录结构:

dir1\dir2\dir3a
dir1\dir2\dir3b

我试图解压缩它并维护目录结构,但是出现错误:

IOError: [Errno 2] No such file or directory: 'C:\\\projects\\\testFolder\\\subdir\\\unzip.exe'

其中 testFolder 是上面的 dir1,子目录是 dir2。

有没有快速解压文件并维护目录结构的方法?

I have a zip file which contains the following directory structure:

dir1\dir2\dir3a
dir1\dir2\dir3b

I'm trying to unzip it and maintain the directory structure however I get the error:

IOError: [Errno 2] No such file or directory: 'C:\\\projects\\\testFolder\\\subdir\\\unzip.exe'

where testFolder is dir1 above and subdir is dir2.

Is there a quick way of unzipping the file and maintaining the directory structure?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

你另情深 2024-07-22 12:16:46

如果您使用的是 Python 2.6,那么 extract 和 extractall 方法就非常有用。 我现在必须使用Python 2.5,所以我只需要创建目录(如果它们不存在)。 您可以使用 namelist() 方法获取目录列表。 这些目录总是以正斜杠结尾(即使在 Windows 上),例如,

import os, zipfile

z = zipfile.ZipFile('myfile.zip')
for f in z.namelist():
    if f.endswith('/'):
        os.makedirs(f)

您可能不想完全像那样做(即,您可能想要提取 zip 的内容)文件(当您迭代名单时),但您明白了。

The extract and extractall methods are great if you're on Python 2.6. I have to use Python 2.5 for now, so I just need to create the directories if they don't exist. You can get a listing of directories with the namelist() method. The directories will always end with a forward slash (even on Windows) e.g.,

import os, zipfile

z = zipfile.ZipFile('myfile.zip')
for f in z.namelist():
    if f.endswith('/'):
        os.makedirs(f)

You probably don't want to do it exactly like that (i.e., you'd probably want to extract the contents of the zip file as you iterate over the namelist), but you get the idea.

乱世争霸 2024-07-22 12:16:46

不要信任 extract() 或 extractall()。

这些方法盲目地将文件提取到文件名中给出的路径。 但 ZIP 文件名可以是任何名称,包括像“x/../../../etc/passwd”这样的危险字符串。 提取此类文件可能会危及整个服务器。

也许这应该被视为 Python zipfile 模块中的一个可报告的安全漏洞,但许多 zip-dearchiver 在过去都表现出了完全相同的行为。 要安全地解压缩具有文件夹结构的 ZIP 文件,您需要深入检查每个文件路径。

Don't trust extract() or extractall().

These methods blindly extract files to the paths given in their filenames. But ZIP filenames can be anything at all, including dangerous strings like “x/../../../etc/passwd”. Extract such files and you could have just compromised your entire server.

Maybe this should be considered a reportable security hole in Python's zipfile module, but any number of zip-dearchivers have exhibited the exact same behaviour in the past. To unarchive a ZIP file with folder structure safely you need in-depth checking of each file path.

十年不长 2024-07-22 12:16:46

我尝试过这个,并且可以重现它。 正如其他答案所建议的, extractall 方法并不能解决问题。 对我来说,这似乎是 zipfile 模块中的一个错误(也许仅限 Windows?),除非我误解了 zipfile 的结构。

testa\
testa\testb\
testa\testb\test.log
> test.zip

>>> from zipfile import ZipFile
>>> zipTest = ZipFile("C:\\...\\test.zip")
>>> zipTest.extractall("C:\\...\\")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...\zipfile.py", line 940, in extractall
  File "...\zipfile.py", line 928, in extract
  File "...\zipfile.py", line 965, in _extract_member
IOError: [Errno 2] No such file or directory: 'C:\\...\\testa\\testb\\test.log'

如果我执行 printdir(),我会得到这个(第一列):

>>> zipTest.printdir()
File Name
testa/testb/
testa/testb/test.log

如果我尝试仅提取第一个条目,如下所示:

>>> zipTest.extract("testa/testb/")
'C:\\...\\testa\\testb'

在磁盘上,这会导致创建一个文件夹 testa,里面有一个文件 testb。 这显然是随后尝试提取 test.log 失败的原因; testa\testb 是一个文件,而不是文件夹。

编辑#1:如果您只提取文件,那么它就可以工作:

>>> zipTest.extract("testa/testb/test.log")
'C:\\...\\testa\\testb\\test.log'

编辑#2:Jeff 的代码就是正确的方法; 遍历namelist; 如果是目录,则创建该目录。 否则,提取文件。

I tried this out, and can reproduce it. The extractall method, as suggested by other answers, does not solve the problem. This seems like a bug in the zipfile module to me (perhaps Windows-only?), unless I'm misunderstanding how zipfiles are structured.

testa\
testa\testb\
testa\testb\test.log
> test.zip

>>> from zipfile import ZipFile
>>> zipTest = ZipFile("C:\\...\\test.zip")
>>> zipTest.extractall("C:\\...\\")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...\zipfile.py", line 940, in extractall
  File "...\zipfile.py", line 928, in extract
  File "...\zipfile.py", line 965, in _extract_member
IOError: [Errno 2] No such file or directory: 'C:\\...\\testa\\testb\\test.log'

If I do a printdir(), I get this (first column):

>>> zipTest.printdir()
File Name
testa/testb/
testa/testb/test.log

If I try to extract just the first entry, like this:

>>> zipTest.extract("testa/testb/")
'C:\\...\\testa\\testb'

On disk, this results in the creation of a folder testa, with a file testb inside. This is apparently the reason why the subsequent attempt to extract test.log fails; testa\testb is a file, not a folder.

Edit #1: If you extract just the file, then it works:

>>> zipTest.extract("testa/testb/test.log")
'C:\\...\\testa\\testb\\test.log'

Edit #2: Jeff's code is the way to go; iterate through namelist; if it's a directory, create the directory. Otherwise, extract the file.

我还不会笑 2024-07-22 12:16:46

我知道现在说这个可能有点晚了,但杰夫是对的。
很简单:

import os
from zipfile import ZipFile as zip

def extractAll(zipName):
    z = zip(zipName)
    for f in z.namelist():
        if f.endswith('/'):
            os.makedirs(f)
        else:
            z.extract(f)

if __name__ == '__main__':
    zipList = ['one.zip', 'two.zip', 'three.zip']
    for zip in zipList:
        extractAll(zipName)

I know it may be a little late to say this but Jeff is right.
It's as simple as:

import os
from zipfile import ZipFile as zip

def extractAll(zipName):
    z = zip(zipName)
    for f in z.namelist():
        if f.endswith('/'):
            os.makedirs(f)
        else:
            z.extract(f)

if __name__ == '__main__':
    zipList = ['one.zip', 'two.zip', 'three.zip']
    for zip in zipList:
        extractAll(zipName)
梦太阳 2024-07-22 12:16:46

如果您使用 Python 2.6,有一个非常简单的方法: extractall< /a> 方法。

但是,由于 zipfile 模块完全用 Python 实现,没有任何 C 扩展,因此您可以将其从 2.6 安装中复制出来,并与旧版本的 Python 一起使用; 您可能会发现这比必须自己重新实现功能更容易。 然而,该函数本身相当短:

def extractall(self, path=None, members=None, pwd=None):
    """Extract all members from the archive to the current working
       directory. `path' specifies a different directory to extract to.
       `members' is optional and must be a subset of the list returned
       by namelist().
    """
    if members is None:
        members = self.namelist()

    for zipinfo in members:
        self.extract(zipinfo, path, pwd)

There's a very easy way if you're using Python 2.6: the extractall method.

However, since the zipfile module is implemented completely in Python without any C extensions, you can probably copy it out of a 2.6 installation and use it with an older version of Python; you may find this easier than having to reimplement the functionality yourself. However, the function itself is quite short:

def extractall(self, path=None, members=None, pwd=None):
    """Extract all members from the archive to the current working
       directory. `path' specifies a different directory to extract to.
       `members' is optional and must be a subset of the list returned
       by namelist().
    """
    if members is None:
        members = self.namelist()

    for zipinfo in members:
        self.extract(zipinfo, path, pwd)
山有枢 2024-07-22 12:16:46

听起来您正在尝试运行 unzip 来解压 zip 文件。

最好使用 python zipfile 模块,因此在 python 中进行提取。

import zipfile

def extract(zipfilepath, extractiondir):
    zip = zipfile.ZipFile(zipfilepath)
    zip.extractall(path=extractiondir)

It sounds like you are trying to run unzip to extract the zip.

It would be better to use the python zipfile module, and therefore do the extraction in python.

import zipfile

def extract(zipfilepath, extractiondir):
    zip = zipfile.ZipFile(zipfilepath)
    zip.extractall(path=extractiondir)
耶耶耶 2024-07-22 12:16:46

过滤名单以排除文件夹

您所要做的就是过滤掉以 / 结尾的 namelist() 条目,问题就解决了:

  z.extractall(dest, filter(lambda f: not f.endswith('/'), z.namelist()))

nJoy!

Filter namelist to exclude the folders

All you have to do is filter out the namelist() entries ending with / and the problem is resolved:

  z.extractall(dest, filter(lambda f: not f.endswith('/'), z.namelist()))

nJoy!

心意如水 2024-07-22 12:16:46

如果像我一样,您必须使用较旧的 Python 版本(在我的例子中为 2.4)提取完整的 zip 存档,这就是我的想法(基于 Jeff 的回答):

import zipfile
import os

def unzip(source_file_path, destination_dir):
    destination_dir += '/'
    z = zipfile.ZipFile(source_file_path, 'r')
    for file in z.namelist():
        outfile_path = destination_dir + file
        if file.endswith('/'):
            os.makedirs(outfile_path)
        else:
            outfile = open(outfile_path, 'wb')
            outfile.write(z.read(file))
            outfile.close()
    z.close()

If like me, you have to extract a complete zip archive with an older Python release (in my case, 2.4) here's what I came up with (based on Jeff's answer):

import zipfile
import os

def unzip(source_file_path, destination_dir):
    destination_dir += '/'
    z = zipfile.ZipFile(source_file_path, 'r')
    for file in z.namelist():
        outfile_path = destination_dir + file
        if file.endswith('/'):
            os.makedirs(outfile_path)
        else:
            outfile = open(outfile_path, 'wb')
            outfile.write(z.read(file))
            outfile.close()
    z.close()
乖乖兔^ω^ 2024-07-22 12:16:46

请注意,zip 文件可以包含目录和文件条目。 使用 zip 命令创建存档时,传递 -D 选项以禁止向存档显式添加目录条目。 当 Python 2.6 的 ZipFile.extractall 方法在目录条目上运行时,它似乎在其位置创建了一个文件。 由于归档条目不一定按顺序排列,这会导致 ZipFile.extractall 经常失败,因为它尝试在文件的子目录中创建文件。 如果您有一个想要与 Python 模块一起使用的存档,只需将其解压并使用 -D 选项重新压缩即可。 这是我一段时间以来一直使用的一个小片段:

P=`pwd` && 
Z=`mktemp -d -t zip` && 
pushd $Z && 
unzip $P/<busted>.zip && 
zip -r -D $P/<new>.zip . && 
popd && 
rm -rf $Z

.zip.zip 替换为相对于的真实文件名当前目录。 然后只需复制整个内容并将其粘贴到命令 shell 中,它将创建一个新的存档,准备好与 Python 2.6 一起使用。 一个zip命令可以在不解压的情况下删除这些目录条目,但 IIRC 它在不同的 shell 环境或 zip 配置中表现得很奇怪。

Note that zip files can have entries for directories as well as files. When creating archives with the zip command, pass the -D option to disable adding directory entries explicitly to the archive. When Python 2.6's ZipFile.extractall method runs across a directory entry, it seems to create a file in its place. Since archive entries aren't necessarily in order, this causes ZipFile.extractall to fail quite often, as it tries to create a file in a subdirectory of a file. If you've got an archive that you want to use with the Python module, simply extract it and re-zip it with the -D option. Here's a little snippet I've been using for a while to do exactly that:

P=`pwd` && 
Z=`mktemp -d -t zip` && 
pushd $Z && 
unzip $P/<busted>.zip && 
zip -r -D $P/<new>.zip . && 
popd && 
rm -rf $Z

Replace <busted>.zip and <new>.zip with real filenames relative to the current directory. Then just copy the whole thing and paste it into a command shell, and it will create a new archive that's ready to rock with Python 2.6. There is a zip command that will remove these directory entries without unzipping but IIRC it behaved oddly in different shell environments or zip configurations.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文