python zipfile 模块似乎没有压缩我的文件
我做了一个小辅助函数:
import zipfile
def main(archive_list=[],zfilename='default.zip'):
print zfilename
zout = zipfile.ZipFile(zfilename, "w")
for fname in archive_list:
print "writing: ", fname
zout.write(fname)
zout.close()
if __name__ == '__main__':
main()
问题是我的所有文件都没有被压缩!这些文件大小相同,实际上,只是扩展名更改为“.zip”(在本例中为“.xls”)。
我在 winXP sp2 上运行 python 2.5。
I made a little helper function:
import zipfile
def main(archive_list=[],zfilename='default.zip'):
print zfilename
zout = zipfile.ZipFile(zfilename, "w")
for fname in archive_list:
print "writing: ", fname
zout.write(fname)
zout.close()
if __name__ == '__main__':
main()
The problem is that all my files are NOT being COMPRESSED! The files are the same size and, effectively, just the extension is being change to ".zip" (from ".xls" in this case).
I'm running python 2.5 on winXP sp2.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是因为
ZipFile
要求您指定压缩方法。如果您不指定它,则假定压缩方法为zipfile.ZIP_STORED
,该方法仅存储文件而不压缩它们。您需要将方法指定为zipfile.ZIP_DEFLATED
。您需要为此安装zlib
模块(通常默认安装)。更新:根据 文档 (python 3.7),“压缩”参数的值应指定以覆盖默认值,即 ZIP_STORED。可用选项为 ZIP_DEFLATED、ZIP_BZIP2 或 ZIP_LZMA,并且相应的库 zlib、bz2 或 lzma 应该可用。
This is because
ZipFile
requires you to specify the compression method. If you don't specify it, it assumes the compression method to bezipfile.ZIP_STORED
, which only stores the files without compressing them. You need to specify the method to bezipfile.ZIP_DEFLATED
. You will need to have thezlib
module installed for this (it is usually installed by default).Update: As per the documentation (python 3.7), value for 'compression' argument should be specified to override the default, which is ZIP_STORED. The available options are ZIP_DEFLATED, ZIP_BZIP2 or ZIP_LZMA and the corresponding libraries zlib, bz2 or lzma should be available.
希望这对某人有用。
我测试了所有 zip 模式,并在两个数据集上对它们进行了基准测试。第一个较小(约 30 MB),另一个较大(约 1.5 GB)。它们由各种类型的文件组成,因此尽可能接近现实生活场景。我对每个数据集做了两种测试方法:“比例”方法和“完整”方法。两项测试均重复 3 次,取平均值。这些结果可能会因您的机器而异,但我认为它仍然是一个很好的起点。
我用两种方法进行了测试,因为我正在尝试制作自己的专用备份解决方案。
比例方法创建更多的 zip 文件,但它允许我在必要时传输较小的数据包,例如。仅替换已更改的内容。事情比这更复杂,但现在并不重要。
完整的方法就是直接压缩整个文件夹。
压缩比计算:
基本上该数字越高越好。
每个 zip 存档的初始化如下:
以下是结果:
看来无论采用哪种方式,最优的压缩方式都是ZIP_DEFLATED。
唯一较小的存档大小给了我 ZIP_LZMA 模式,但它只是 % 的一小部分,并且对于大数据集花费的时间大约是 8 倍。
此外,我使用相同的数据集和方法尝试了不同级别的压缩。但这次每个级别只有一次运行。
看起来 ZIP_DEFLATED 和 ZIP_BIP2 具有类似的压缩功能,但第二个要慢得多。对于大型数据集,压缩级别 1 或 2 就足够了。增加更多对最终文件大小没有显着影响。如果工作负载需要大量“小”zip 文件,最好使用级别 9。它提供高压缩比,但所需时间与级别 1 大致相同。
Hope this is going to be useful to someone.
I tested all zip modes and benchmarked them on two data sets. First one small (~30 MB) and other large (~ 1,5 GB). They consisted of various types of files so it would be as close to real life scenario as possible. I did two methods of tests on each dataset: the “proportional” one and the “complete” one. Both tests where repeated 3 times one after another to get an average. Those result may differ depending on your machines, but I think it’s still a good place to start.
I did the test in two methods because I’m trying to make my own specialized backup solution.
The proportional method creates more zip files but it allows me to transfer smaller packages of data if necessary eg. replacing only things that changed. It's more complicated than that, but it is not important right now.
The complete method is just straight up compressing whole folder.
Compression ratio calculation:
Basically the higher that number the better.
Each zip archive was initialized like this:
Here are the results:
It seems that no matter the method, the most optimal compression mode is ZIP_DEFLATED.
The only smaller archive size gave me ZIP_LZMA mode, but it was only fraction of % and it took about 8x longer for large data sets.
Furthermore I tried different levels of compression with the same data set and methods. Except this time there was only one run per level.
It looks like ZIP_DEFLATED and ZIP_BIP2 have similar compression capabilities, but the second one is much slower. For large data sets the compression level of 1 or 2 should suffice. Increasing it more gives no significant effect on final file size. If the workload demands a lot of “small” zip files it is better to use level 9. It gives high compression ratio but takes about the same amount of time as at level 1.
有一种非常简单的方法来压缩
zip
格式,在
shutil.make_archive
库中使用。例如:
可以在以下位置查看更广泛的文档:此处
There is a really easy way to compress
zip
format,Use in
shutil.make_archive
library.For example:
Can see more extensive documentation at: Here