使用 python zipfile 从 zip 中提取文件而不保留顶级文件夹

发布于 2024-12-23 18:04:05 字数 765 浏览 3 评论 0原文

我正在使用当前代码从 zip 文件中提取文件,同时保留目录结构:

zip_file = zipfile.ZipFile('archive.zip', 'r')
zip_file.extractall('/dir/to/extract/files/')
zip_file.close()

这是示例 zip 文件的结构:

/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg

最后我想要这个:

/dir/to/extract/file.jpg
/dir/to/extract/file1.jpg
/dir/to/extract/file2.jpg

但只有当 zip 文件具有顶部时才应该忽略-level 文件夹,其中包含所有文件,因此当我提取具有此结构的 zip 时:

/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg
/dir2/file.txt
/file.mp3

它应该保持这样:

/dir/to/extract/dir1/file.jpg
/dir/to/extract/dir1/file1.jpg
/dir/to/extract/dir1/file2.jpg
/dir/to/extract/dir2/file.txt
/dir/to/extract/file.mp3

有什么想法吗?

I'm using the current code to extract the files from a zip file while keeping the directory structure:

zip_file = zipfile.ZipFile('archive.zip', 'r')
zip_file.extractall('/dir/to/extract/files/')
zip_file.close()

Here is a structure for an example zip file:

/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg

At the end I want this:

/dir/to/extract/file.jpg
/dir/to/extract/file1.jpg
/dir/to/extract/file2.jpg

But it should ignore only if the zip file has a top-level folder with all files inside it, so when I extract a zip with this structure:

/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg
/dir2/file.txt
/file.mp3

It should stay like this:

/dir/to/extract/dir1/file.jpg
/dir/to/extract/dir1/file1.jpg
/dir/to/extract/dir1/file2.jpg
/dir/to/extract/dir2/file.txt
/dir/to/extract/file.mp3

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

关于从前 2024-12-30 18:04:06

如果我正确理解您的问题,您希望在提取 zip 中的项目之前删除所有常见的前缀目录。

如果是这样,那么以下脚本应该执行您想要的操作:

import sys, os
from zipfile import ZipFile

def get_members(zip):
    parts = []
    # get all the path prefixes
    for name in zip.namelist():
        # only check files (not directories)
        if not name.endswith('/'):
            # keep list of path elements (minus filename)
            parts.append(name.split('/')[:-1])
    # now find the common path prefix (if any)
    prefix = os.path.commonprefix(parts)
    if prefix:
        # re-join the path elements
        prefix = '/'.join(prefix) + '/'
    # get the length of the common prefix
    offset = len(prefix)
    # now re-set the filenames
    for zipinfo in zip.infolist():
        name = zipinfo.filename
        # only check files (not directories)
        if len(name) > offset:
            # remove the common prefix
            zipinfo.filename = name[offset:]
            yield zipinfo

args = sys.argv[1:]

if len(args):
    zip = ZipFile(args[0])
    path = args[1] if len(args) > 1 else '.'
    zip.extractall(path, get_members(zip))

If I understand your question correctly, you want to strip any common prefix directories from the items in the zip before extracting them.

If so, then the following script should do what you want:

import sys, os
from zipfile import ZipFile

def get_members(zip):
    parts = []
    # get all the path prefixes
    for name in zip.namelist():
        # only check files (not directories)
        if not name.endswith('/'):
            # keep list of path elements (minus filename)
            parts.append(name.split('/')[:-1])
    # now find the common path prefix (if any)
    prefix = os.path.commonprefix(parts)
    if prefix:
        # re-join the path elements
        prefix = '/'.join(prefix) + '/'
    # get the length of the common prefix
    offset = len(prefix)
    # now re-set the filenames
    for zipinfo in zip.infolist():
        name = zipinfo.filename
        # only check files (not directories)
        if len(name) > offset:
            # remove the common prefix
            zipinfo.filename = name[offset:]
            yield zipinfo

args = sys.argv[1:]

if len(args):
    zip = ZipFile(args[0])
    path = args[1] if len(args) > 1 else '.'
    zip.extractall(path, get_members(zip))
感情旳空白 2024-12-30 18:04:06

读取 ZipFile.namelist() 返回的条目以查看它们是否位于同一目录中,然后打开/读取每个条目并将其写入使用 open()< 打开的文件/代码>。

Read the entries returned by ZipFile.namelist() to see if they're in the same directory, and then open/read each entry and write it to a file opened with open().

兲鉂ぱ嘚淚 2024-12-30 18:04:06

这可能是 zip 存档本身的问题。在 python 提示符下尝试执行此操作以查看文件是否位于 zip 文件本身的正确目录中。

import zipfile

zf = zipfile.ZipFile("my_file.zip",'r')
first_file = zf.filelist[0]
print file_list.filename

这应该是“dir1”之类的内容
重复上述步骤,将 1 的索引替换为文件列表,如下所示 first_file = zf.filelist[1] 这次输出应类似于“dir1/file1.jpg”,如果情况并非如此,则zip 文件不包含目录,将全部解压缩到一个目录。

This might be a problem with the zip archive itself. In a python prompt try this to see if the files are in the correct directories in the zip file itself.

import zipfile

zf = zipfile.ZipFile("my_file.zip",'r')
first_file = zf.filelist[0]
print file_list.filename

This should say something like "dir1"
repeat the steps above substituting and index of 1 into filelist like so first_file = zf.filelist[1] This time the output should look like 'dir1/file1.jpg' if this is not the case then the zip file does not contain directories and will be unzipped all to one single directory.

独留℉清风醉 2024-12-30 18:04:06

根据@ekhumoro的回答,我想出了一个更简单的函数来提取同一级别上的所有内容,这并不完全是您所要求的,但我认为可以帮助某人。

    def _basename_members(self, zip_file: ZipFile):
        for zipinfo in zip_file.infolist():
            zipinfo.filename = os.path.basename(zipinfo.filename)
            yield zipinfo

    from_zip="some.zip"
    to_folder="some_destination/"
    with ZipFile(file=from_zip, mode="r") as zip_file:
        os.makedirs(to_folder, exist_ok=True)
        zip_infos = self._basename_members(zip_file)
        zip_file.extractall(path=to_folder, members=zip_infos)

Based on the @ekhumoro's answer I come up with a simpler funciton to extract everything on the same level, it is not exactly what you are asking but I think can help someone.

    def _basename_members(self, zip_file: ZipFile):
        for zipinfo in zip_file.infolist():
            zipinfo.filename = os.path.basename(zipinfo.filename)
            yield zipinfo

    from_zip="some.zip"
    to_folder="some_destination/"
    with ZipFile(file=from_zip, mode="r") as zip_file:
        os.makedirs(to_folder, exist_ok=True)
        zip_infos = self._basename_members(zip_file)
        zip_file.extractall(path=to_folder, members=zip_infos)
绮烟 2024-12-30 18:04:06

基本上您需要做两件事:

  1. 识别 zip 中的根目录。
  2. 从 zip 中其他项目的路径中删除根目录。

以下内容应保留 zip 的整体结构,同时删除根目录:

import typing, zipfile

def _is_root(info: zipfile.ZipInfo) -> bool:
    if info.is_dir():
        parts = info.filename.split("/")
        # Handle directory names with and without trailing slashes.
        if len(parts) == 1 or (len(parts) == 2 and parts[1] == ""):
            return True
    return False

def _members_without_root(archive: zipfile.ZipFile, root_filename: str) -> typing.Generator:
    for info in archive.infolist():
        parts = info.filename.split(root_filename)
        if len(parts) > 1 and parts[1]:
            # We join using the root filename, because there might be a subdirectory with the same name.
            info.filename = root_filename.join(parts[1:])
            yield info

with zipfile.ZipFile("archive.zip", mode="r") as archive:
    # We will use the first directory with no more than one path segment as the root.
    root = next(info for info in archive.infolist() if _is_root(info))
    if root:
        archive.extractall(path="/dir/to/extract/", members=_members_without_root(archive, root.filename))
    else:
        print("No root directory found in zip.")

Basically you need to do two things:

  1. Identify the root directory in the zip.
  2. Remove the root directory from the paths of other items in the zip.

The following should retain the overall structure of the zip while removing the root directory:

import typing, zipfile

def _is_root(info: zipfile.ZipInfo) -> bool:
    if info.is_dir():
        parts = info.filename.split("/")
        # Handle directory names with and without trailing slashes.
        if len(parts) == 1 or (len(parts) == 2 and parts[1] == ""):
            return True
    return False

def _members_without_root(archive: zipfile.ZipFile, root_filename: str) -> typing.Generator:
    for info in archive.infolist():
        parts = info.filename.split(root_filename)
        if len(parts) > 1 and parts[1]:
            # We join using the root filename, because there might be a subdirectory with the same name.
            info.filename = root_filename.join(parts[1:])
            yield info

with zipfile.ZipFile("archive.zip", mode="r") as archive:
    # We will use the first directory with no more than one path segment as the root.
    root = next(info for info in archive.infolist() if _is_root(info))
    if root:
        archive.extractall(path="/dir/to/extract/", members=_members_without_root(archive, root.filename))
    else:
        print("No root directory found in zip.")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文