如何使用 Python 创建完整压缩的 tar 文件?

发布于 2024-08-17 10:27:02 字数 36 浏览 4 评论 0原文

如何在 Python 中创建压缩的 .tar.gz 文件?

How can I create a .tar.gz file with compression in Python?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

梦断已成空 2024-08-24 10:27:02

为整个目录树构建 .tar.gz(又名 .tgz):

import tarfile
import os.path

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))

这将创建一个 gzip 压缩的 tar 存档,其中包含同名的单个顶级文件夹内容为 source_dir

To build a .tar.gz (aka .tgz) for an entire directory tree:

import tarfile
import os.path

def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        tar.add(source_dir, arcname=os.path.basename(source_dir))

This will create a gzipped tar archive containing a single top-level folder with the same name and contents as source_dir.

依 靠 2024-08-24 10:27:02
import tarfile
tar = tarfile.open("sample.tar.gz", "w:gz")
for name in ["file1", "file2", "file3"]:
    tar.add(name)
tar.close()

如果要创建 tar.bz2 压缩文件,只需将文件扩展名替换为“.tar.bz2”,将“w:gz”替换为“w:bz2”即可。

import tarfile
tar = tarfile.open("sample.tar.gz", "w:gz")
for name in ["file1", "file2", "file3"]:
    tar.add(name)
tar.close()

If you want to create a tar.bz2 compressed file, just replace file extension name with ".tar.bz2" and "w:gz" with "w:bz2".

不寐倦长更 2024-08-24 10:27:02

您可以使用 mode='w:gz 调用 tarfile.open ',意思是“打开 gzip 压缩写入。”

您可能希望以 .tar.gz 结尾文件名(openname 参数),但这不会影响压缩能力。

顺便说一句,您通常可以使用 'w:bz2' 模式获得更好的压缩效果,就像 tar 通常可以使用 bzip2 进行比它更好的压缩一样可以使用gzip进行压缩。

You call tarfile.open with mode='w:gz', meaning "Open for gzip compressed writing."

You'll probably want to end the filename (the name argument to open) with .tar.gz, but that doesn't affect compression abilities.

BTW, you usually get better compression with a mode of 'w:bz2', just like tar can usually compress even better with bzip2 than it can compress with gzip.

汐鸠 2024-08-24 10:27:02

之前的答案建议使用 tarfile Python 模块在 Python 中创建 .tar.gz 文件。这显然是一个很好的 Python 风格的解决方案,但它在归档速度方面存在严重缺陷。 这个问题提到tarfile大约是两倍比 Linux 中的 tar 实用程序慢。根据我的经验,这个估计是相当正确的。

因此,为了更快地归档,您可以通过 subprocess 模块使用 tar 命令:

subprocess.call(['tar', '-czf', output_filename, file_to_archive])

Previous answers advise using the tarfile Python module for creating a .tar.gz file in Python. That's obviously a good and Python-style solution, but it has serious drawback in speed of the archiving. This question mentions that tarfile is approximately two times slower than the tar utility in Linux. According to my experience this estimation is pretty correct.

So for faster archiving you can use the tar command using subprocess module:

subprocess.call(['tar', '-czf', output_filename, file_to_archive])
摘星┃星的人 2024-08-24 10:27:02

shutil.make_archive 对于文件和目录都非常方便(内容递归添加到存档中):

import shutil

compressed_file = shutil.make_archive(
        base_name='archive',   # archive file name w/o extension
        format='gztar',        # available formats: zip, gztar, bztar, xztar, tar
        root_dir='path/to/dir' # directory to compress
)

shutil.make_archive is very convenient for both files and directories (contents recursively added to the archive):

import shutil

compressed_file = shutil.make_archive(
        base_name='archive',   # archive file name w/o extension
        format='gztar',        # available formats: zip, gztar, bztar, xztar, tar
        root_dir='path/to/dir' # directory to compress
)
情栀口红 2024-08-24 10:27:02

除了 @Aleksandr Tukallo 的答案之外,您还可以获得输出和错误消息(如果发生)。 以下答案很好地解释了使用 tar 压缩文件夹。

import traceback
import subprocess

try:
    cmd = ['tar', 'czfj', output_filename, file_to_archive]
    output = subprocess.check_output(cmd).decode("utf-8").strip() 
    print(output)          
except Exception:       
    print(f"E: {traceback.format_exc()}")       

In addition to @Aleksandr Tukallo's answer, you could also obtain the output and error message (if occurs). Compressing a folder using tar is explained pretty well on the following answer.

import traceback
import subprocess

try:
    cmd = ['tar', 'czfj', output_filename, file_to_archive]
    output = subprocess.check_output(cmd).decode("utf-8").strip() 
    print(output)          
except Exception:       
    print(f"E: {traceback.format_exc()}")       
哎呦我呸! 2024-08-24 10:27:02

在这个
tar.gz 文件压缩在打开的视图目录中
在解决方案中使用 os.path.basename(file_directory)

import tarfile

with tarfile.open("save.tar.gz","w:gz") as tar:
      for file in ["a.txt","b.log","c.png"]:
           tar.add(os.path.basename(file))

其在目录中的 tar.gz 文件压缩中的使用

In this
tar.gz file compress in open view directory
In solve use os.path.basename(file_directory)

import tarfile

with tarfile.open("save.tar.gz","w:gz") as tar:
      for file in ["a.txt","b.log","c.png"]:
           tar.add(os.path.basename(file))

its use in tar.gz file compress in directory

西瓜 2024-08-24 10:27:02

对 @THAVASI.T 的答案进行了小修正,其中省略了显示“tarfile”库的导入,并且没有定义第三行中使用的“tar”对象。

import tarfile

with tarfile.open("save.tar.gz","w:gz") as tar:
    for file in ["a.txt","b.log","c.png"]:
        tar.add(os.path.basename(file))

Minor correction to @THAVASI.T's answer which omits showing the import of the 'tarfile' library, and does not define the 'tar' object which is used in the third line.

import tarfile

with tarfile.open("save.tar.gz","w:gz") as tar:
    for file in ["a.txt","b.log","c.png"]:
        tar.add(os.path.basename(file))
清秋悲枫 2024-08-24 10:27:02

我使用它来生成 tar.gz 文件,而不包含主文件夹。

import tarfile
import os.path

source_location = r'C:\Users\username\Desktop\New folder'
output_name = r'C:\Users\username\Desktop\new.tar.gz'

# ---------------------------------------------------
#  --- output new.tar.gz with 'New folder' inside ---
#  -> new.tar.gz/New folder/aaaa/a.txt 
#  -> new.tar.gz/New folder/bbbb/b.txt
# ---------------------------------------------------
# def make_tarfile(output_filename, source_dir):
#     with tarfile.open(output_filename, "w:gz") as tar:
#         # tar.add(source_dir, arcname=os.path.basename(source_dir))
#         tar.add(source_dir, arcname=os.path.sep(source_dir))


# ---------------------------------------------------
#  --- output new.tar.gz without 'New folder' inside ---
#  -> new.tar.gz/aaaa/a.txt 
#  -> new.tar.gz/bbbb/b.txt
# ---------------------------------------------------
def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        for root, _, files in os.walk(source_dir):
            for file in files:
                file_path = os.path.join(root, file)
                arcname = os.path.relpath(file_path, source_dir)
                tar.add(file_path, arcname=arcname)

try:
    make_tarfile(output_name, source_location)

except Exception as e:
    print(f"Error: {e}")

I am using this to generate tar.gz file without containing the main folder.

import tarfile
import os.path

source_location = r'C:\Users\username\Desktop\New folder'
output_name = r'C:\Users\username\Desktop\new.tar.gz'

# ---------------------------------------------------
#  --- output new.tar.gz with 'New folder' inside ---
#  -> new.tar.gz/New folder/aaaa/a.txt 
#  -> new.tar.gz/New folder/bbbb/b.txt
# ---------------------------------------------------
# def make_tarfile(output_filename, source_dir):
#     with tarfile.open(output_filename, "w:gz") as tar:
#         # tar.add(source_dir, arcname=os.path.basename(source_dir))
#         tar.add(source_dir, arcname=os.path.sep(source_dir))


# ---------------------------------------------------
#  --- output new.tar.gz without 'New folder' inside ---
#  -> new.tar.gz/aaaa/a.txt 
#  -> new.tar.gz/bbbb/b.txt
# ---------------------------------------------------
def make_tarfile(output_filename, source_dir):
    with tarfile.open(output_filename, "w:gz") as tar:
        for root, _, files in os.walk(source_dir):
            for file in files:
                file_path = os.path.join(root, file)
                arcname = os.path.relpath(file_path, source_dir)
                tar.add(file_path, arcname=arcname)

try:
    make_tarfile(output_name, source_location)

except Exception as e:
    print(f"Error: {e}")
不美如何 2024-08-24 10:27:02

只是重述@George V. Reilly的出色答案,但以更清晰的形式...

import tarfile


fd_path="/some/folder/path/"
fl_name="some_file_name.ext"
targz_fd_path_n_fl_name="/some/folder/path/some_file_name.tar.gz"

with tarfile.open(targz_fd_path_n_fl_name, "w:gz") as tar:
    tar.add(fd_path + fl_name, fl_name)

正如@Brōtsyorfuzthrāx指出的(但以另一种方式)如果你留下“add”方法的第二个参数,那么它会给你整个路径tar 文件中 fd_path + fl_name 的结构。

当然,

import tarfile
import os

fd_path_n_fl_name="/some/folder/path/some_file_name.ext"
targz_fd_path_n_fl_name="/some/folder/path/some_file_name.tar.gz"

with tarfile.open(targz_fd_path_n_fl_name, "w:gz") as tar:
    tar.add(fd_path_n_fl_name, os.path.basename(fd_path_n_fl_name))

如果你不想使用或者没有将文件夹路径和文件名分开,你可以使用... ...。

谢谢!

Just restating @George V. Reilly 's excellent answer, but in a clearer form...

import tarfile


fd_path="/some/folder/path/"
fl_name="some_file_name.ext"
targz_fd_path_n_fl_name="/some/folder/path/some_file_name.tar.gz"

with tarfile.open(targz_fd_path_n_fl_name, "w:gz") as tar:
    tar.add(fd_path + fl_name, fl_name)

As @Brōtsyorfuzthrāx pointed out (but in another way) if you leave the "add" method second argument then it'll give you the entire path structure of fd_path + fl_name in the tar file.

Of course you can use...

import tarfile
import os

fd_path_n_fl_name="/some/folder/path/some_file_name.ext"
targz_fd_path_n_fl_name="/some/folder/path/some_file_name.tar.gz"

with tarfile.open(targz_fd_path_n_fl_name, "w:gz") as tar:
    tar.add(fd_path_n_fl_name, os.path.basename(fd_path_n_fl_name))

... if you don't want to use or don't have the folder path and file name separated.

Thanks!????

愛上了 2024-08-24 10:27:02

最佳性能并且压缩文件中没有 ...请参阅下面的漏洞警告:

注意(感谢 MaxTruxa):

这个答案很容易受到 shell 注入的影响。请阅读文档中的安全注意事项。如果 shell=True,切勿将未转义的字符串传递给 subprocess.runsubprocess.call 等。使用 shlex.quote 进行转义(仅限 Unix shell)。

我在本地使用它 - 所以它很适合我的需求。

subprocess.call(f'tar -cvzf {output_filename} *', cwd=source_dir, shell=True)

cwd 参数在压缩之前更改目录 - 这解决了点的问题。

shell=True 允许使用通配符 (*)

也适用于递归目录

best performance and without the . and .. in compressed file! See vulnerability warning below:

NOTICE (thanks MaxTruxa):

this answer is vulnerable to shell injections. Please read the security considerations from the docs. Never pass unescaped strings to subprocess.run, subprocess.call, etc. if shell=True. Use shlex.quote to escape (Unix shells only).

I'm using it locally - so it's good for my needs.

subprocess.call(f'tar -cvzf {output_filename} *', cwd=source_dir, shell=True)

the cwd argument changes directory before compressing - which solves the issue with the dots.

the shell=True allows wildcard usage (*)

WORKS also for a directory recursively

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文