python unzip——非常慢?

发布于 2024-10-17 04:20:16 字数 258 浏览 5 评论 0原文

有人可以解释一下下面的谜团吗?

我创建了一个大小约为 37[MB] 的二进制文件。在 Ubuntu 中使用终端压缩它花费了不到 1[秒]。然后我尝试了 python:以编程方式压缩它(使用 zipfile 模块)也花费了大约 1[秒]。

然后我尝试解压缩我创建的 zip 文件。在 Ubuntu 中——使用终端——这花费了不到 1[秒]。

在 python 中,解压缩代码(使用 zipfile 模块)需要接近 37[秒] 才能运行!有什么想法为什么吗?

Can somebody please explain the following mystery?

I created a binary file of size ~37[MB]. zipping it in Ubuntu -- using the terminal -- took less than 1[sec]. I then tried python: zipping it programatically (using the zipfile module) took also about 1[sec].

I then tried to unzip the zip file I created. In Ubuntu -- using the terminal -- this took less than 1[sec].

In python, the code to unzip (used the zipfile module) took close to 37[sec] to run! any ideas why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

深空失忆 2024-10-24 04:20:16

我也在努力使用Python解压缩/解压缩/提取zip文件,并且“创建ZipFile对象,循环遍历其.namelist(),读取文件并将它们写入文件系统”低级方法似乎不太Python 。所以我开始挖掘 zipfile 对象,我认为这些对象没有很好的文档记录,并且涵盖了所有对象方法:

>>> from zipfile import ZipFile
>>> filepath = '/srv/pydocfiles/packages/ebook.zip'
>>> zip = ZipFile(filepath)
>>> dir(zip)
['NameToInfo', '_GetContents', '_RealGetContents', '__del__', '__doc__', '__enter__', '__exit__', '__init__', '__module__', '_allowZip64', '_didModify', '_extract_member', '_filePassed', '_writecheck', 'close', 'comment', 'compression', 'debug', 'extract', 'extractall', 'filelist', 'filename', 'fp', 'getinfo', 'infolist', 'mode', 'namelist', 'open', 'printdir', 'pwd', 'read', 'setpassword', 'start_dir', 'testzip', 'write', 'writestr'] 

“extractall”方法的工作原理就像 tarfile 的 extractall ! (在 python 2.6 和 2.7 上,但不是 2.5)

然后是性能问题;文件 ebook.zip 为 84.6 MB(主要是 pdf 文件),未压缩的文件夹为 103 MB,在 MacOSx 10.5 下通过“Archive Utility”默认压缩。所以我对 Python 的 timeit 模块做了同样的事情:

>>> from timeit import Timer
>>> t = Timer("filepath = '/srv/pydocfiles/packages/ebook.zip'; \
...         extract_to = '/tmp/pydocnet/build'; \
...         from zipfile import ZipFile; \
...         ZipFile(filepath).extractall(path=extract_to)")
>>> 
>>> t.timeit(1)
1.8670060634613037

在重的情况下花费了不到 2 秒已加载的计算机有 90% 的内存正在被其他应用程序使用。

希望这对某人有帮助。

I was struggling to unzip/decompress/extract zip files with Python as well and that "create ZipFile object, loop through its .namelist(), read the files and write them to file system" low-level approach didn't seem very Python. So I started to dig zipfile objects that I believe not very well documented and covered all the object methods:

>>> from zipfile import ZipFile
>>> filepath = '/srv/pydocfiles/packages/ebook.zip'
>>> zip = ZipFile(filepath)
>>> dir(zip)
['NameToInfo', '_GetContents', '_RealGetContents', '__del__', '__doc__', '__enter__', '__exit__', '__init__', '__module__', '_allowZip64', '_didModify', '_extract_member', '_filePassed', '_writecheck', 'close', 'comment', 'compression', 'debug', 'extract', 'extractall', 'filelist', 'filename', 'fp', 'getinfo', 'infolist', 'mode', 'namelist', 'open', 'printdir', 'pwd', 'read', 'setpassword', 'start_dir', 'testzip', 'write', 'writestr'] 

There we go the "extractall" method works just like tarfile's extractall ! (on python 2.6 and 2.7 but NOT 2.5)

Then the performance concerns; the file ebook.zip is 84.6 MB (mostly pdf files) and uncompressed folder is 103 MB, zipped by default "Archive Utility" under MacOSx 10.5. So I did the same with Python's timeit module:

>>> from timeit import Timer
>>> t = Timer("filepath = '/srv/pydocfiles/packages/ebook.zip'; \
...         extract_to = '/tmp/pydocnet/build'; \
...         from zipfile import ZipFile; \
...         ZipFile(filepath).extractall(path=extract_to)")
>>> 
>>> t.timeit(1)
1.8670060634613037

which took less than 2 seconds on a heavy loaded machine that has 90% of the memory is being used by other applications.

Hope this helps someone.

女中豪杰 2024-10-24 04:20:16

我不知道您使用什么代码来解压缩文件,但以下代码对我有用:创建仅包含一个文件“file1”的 zip 存档“test.zip”后,以下 Python 脚本从存档中提取“file1” :

from zipfile import ZipFile, ZIP_DEFLATED
zip = ZipFile("test.zip", mode='r', compression=ZIP_DEFLATED, allowZip64=False)
data = zip.read("file1")
print len(data)

这几乎不需要时间:我尝试了一个 37MB 的输入文件,该文件被压缩为 15MB 的 zip 存档。在此示例中,Python 脚本在我的 MacBook Pro 上花费了 0.346 秒。也许在您的情况下,这 37 秒被您对数据所做的事情占用了?

I don't know what code you use to unzip your file, but the following works for me: After creating a zip archive "test.zip" containing just one file "file1", the following Python script extracts "file1" from the archive:

from zipfile import ZipFile, ZIP_DEFLATED
zip = ZipFile("test.zip", mode='r', compression=ZIP_DEFLATED, allowZip64=False)
data = zip.read("file1")
print len(data)

This takes nearly no time: I tried a 37MB input file which compressed down to a 15MB zip archive. In this example the Python script took 0.346 seconds on my MacBook Pro. Maybe in your case the 37 seconds were taken up by something you did with the data instead?

禾厶谷欠 2024-10-24 04:20:16

一些选项:

  • 使用subprocess将其推迟到某些外部工具。您可以将数据直接通过管道传递给它。
  • czipfile,但似乎不再维护(最后一个版本 2010)。最近的一个分支是 ziyuang/czipfile (最后更新于 2019 年)。
  • PyTorch 具有内部本机 torch._C.PyTorchFileReader,它可以读取 zip 文件、查看 torch.load 逻辑和 _open_zipfile_reader。目前它不支持任意 zip 文件,但我认为它只需要进行一些小的修改即可支持它。
  • libzip.py (2023) 是 libzip。但好像很陌生?

Some options:

  • Use subprocess to defer it to some external tool. You can pipe data directly to it.
  • czipfile, but that does not seem to be maintained anymore (last release 2010). A somewhat recent fork is ziyuang/czipfile (last update 2019).
  • PyTorch has the internal native torch._C.PyTorchFileReader which can read zip files, see the torch.load logic, and _open_zipfile_reader. This does not support arbitrary zip files currently, but I think it only would need minor adaptations to support it.
  • libzip.py (2023) is a ctypes wrapper around libzip. But it seems very unknown?
太傻旳人生 2024-10-24 04:20:16

我们可以使用 ubuntu 在 python 中提供的 zip 功能,而不是使用 python 模块。我使用这个是因为有时 python zip 会失败。

import os

filename = test
os.system('7z a %s.zip %s'% (filename, filename))

Instead of using the python module we can use the zip featured offered by ubuntu in python. I use this because sometimes the python zip fails.

import os

filename = test
os.system('7z a %s.zip %s'% (filename, filename))
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文