python unzip——非常慢?
有人可以解释一下下面的谜团吗?
我创建了一个大小约为 37[MB] 的二进制文件。在 Ubuntu 中使用终端压缩它花费了不到 1[秒]。然后我尝试了 python:以编程方式压缩它(使用 zipfile 模块)也花费了大约 1[秒]。
然后我尝试解压缩我创建的 zip 文件。在 Ubuntu 中——使用终端——这花费了不到 1[秒]。
在 python 中,解压缩代码(使用 zipfile 模块)需要接近 37[秒] 才能运行!有什么想法为什么吗?
Can somebody please explain the following mystery?
I created a binary file of size ~37[MB]. zipping it in Ubuntu -- using the terminal -- took less than 1[sec]. I then tried python: zipping it programatically (using the zipfile module) took also about 1[sec].
I then tried to unzip the zip file I created. In Ubuntu -- using the terminal -- this took less than 1[sec].
In python, the code to unzip (used the zipfile module) took close to 37[sec] to run! any ideas why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我也在努力使用Python解压缩/解压缩/提取zip文件,并且“创建ZipFile对象,循环遍历其.namelist(),读取文件并将它们写入文件系统”低级方法似乎不太Python 。所以我开始挖掘 zipfile 对象,我认为这些对象没有很好的文档记录,并且涵盖了所有对象方法:
“extractall”方法的工作原理就像 tarfile 的 extractall ! (在 python 2.6 和 2.7 上,但不是 2.5)
然后是性能问题;文件 ebook.zip 为 84.6 MB(主要是 pdf 文件),未压缩的文件夹为 103 MB,在 MacOSx 10.5 下通过“Archive Utility”默认压缩。所以我对 Python 的 timeit 模块做了同样的事情:
在重的情况下花费了不到 2 秒已加载的计算机有 90% 的内存正在被其他应用程序使用。
希望这对某人有帮助。
I was struggling to unzip/decompress/extract zip files with Python as well and that "create ZipFile object, loop through its .namelist(), read the files and write them to file system" low-level approach didn't seem very Python. So I started to dig zipfile objects that I believe not very well documented and covered all the object methods:
There we go the "extractall" method works just like tarfile's extractall ! (on python 2.6 and 2.7 but NOT 2.5)
Then the performance concerns; the file ebook.zip is 84.6 MB (mostly pdf files) and uncompressed folder is 103 MB, zipped by default "Archive Utility" under MacOSx 10.5. So I did the same with Python's timeit module:
which took less than 2 seconds on a heavy loaded machine that has 90% of the memory is being used by other applications.
Hope this helps someone.
我不知道您使用什么代码来解压缩文件,但以下代码对我有用:创建仅包含一个文件“file1”的 zip 存档“test.zip”后,以下 Python 脚本从存档中提取“file1” :
这几乎不需要时间:我尝试了一个 37MB 的输入文件,该文件被压缩为 15MB 的 zip 存档。在此示例中,Python 脚本在我的 MacBook Pro 上花费了 0.346 秒。也许在您的情况下,这 37 秒被您对数据所做的事情占用了?
I don't know what code you use to unzip your file, but the following works for me: After creating a zip archive "test.zip" containing just one file "file1", the following Python script extracts "file1" from the archive:
This takes nearly no time: I tried a 37MB input file which compressed down to a 15MB zip archive. In this example the Python script took 0.346 seconds on my MacBook Pro. Maybe in your case the 37 seconds were taken up by something you did with the data instead?
一些选项:
subprocess
将其推迟到某些外部工具。您可以将数据直接通过管道传递给它。torch._C.PyTorchFileReader
,它可以读取 zip 文件、查看torch.load
逻辑和_open_zipfile_reader
。目前它不支持任意 zip 文件,但我认为它只需要进行一些小的修改即可支持它。Some options:
subprocess
to defer it to some external tool. You can pipe data directly to it.torch._C.PyTorchFileReader
which can read zip files, see thetorch.load
logic, and_open_zipfile_reader
. This does not support arbitrary zip files currently, but I think it only would need minor adaptations to support it.我们可以使用 ubuntu 在 python 中提供的 zip 功能,而不是使用 python 模块。我使用这个是因为有时 python zip 会失败。
Instead of using the python module we can use the zip featured offered by ubuntu in python. I use this because sometimes the python zip fails.