从 zip 文件加载 pickle 文件

发布于 2024-09-04 12:15:37 字数 748 浏览 11 评论 0原文

由于某种原因，我无法让 cPickle.load 处理 ZipFile.open() 返回的文件类型对象。如果我对 ZipFile.open() 返回的文件类型对象调用 read() ，我可以使用 cPickle.loads 。

示例 ....

import zipfile
import cPickle

# the data we want to store
some_data = {1: 'one', 2: 'two', 3: 'three'}

#
# create a zipped pickle file
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
zf.writestr('data.pkl', cPickle.dumps(some_data))
zf.close()

#
# cPickle.loads works
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd1 = cPickle.loads(zf.open('data.pkl').read())
zf.close()

#
# cPickle.load doesn't work
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd2 = cPickle.load(zf.open('data.pkl'))
zf.close()

注意：我不想只压缩 pickle 文件，而是压缩许多其他类型的文件。这只是一个例子。

原文

For some reason I cannot get cPickle.load to work on the file-type object returned by ZipFile.open().
If I call read() on the file-type object returned by ZipFile.open() I can use cPickle.loads though.

Example ....

import zipfile
import cPickle

# the data we want to store
some_data = {1: 'one', 2: 'two', 3: 'three'}

#
# create a zipped pickle file
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
zf.writestr('data.pkl', cPickle.dumps(some_data))
zf.close()

#
# cPickle.loads works
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd1 = cPickle.loads(zf.open('data.pkl').read())
zf.close()

#
# cPickle.load doesn't work
#
zf = zipfile.ZipFile('zipped_pickle.zip', 'r')
sd2 = cPickle.load(zf.open('data.pkl'))
zf.close()

Note: I don't want to zip just the pickle file but many files of other types. This is just an example.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

萌酱 2024-09-11 12:15:37

这是由于 zipfile 模块实现的伪文件对象存在缺陷（对于 Python 2.6 中引入的 ZipFile 类的 .open 方法）。考虑：

>>> f = zf.open('data.pkl')
>>> f.read(1)
'('
>>> f.readline()
'dp1\n'
>>> f.read(1)
''
>>>

.read(1) 的序列 - .readline() 是 .loads 内部执行的操作（在协议 0 pickle 上），Python 2 中的默认值，这就是您在这里使用的）。不幸的是，zipfile 的缺陷意味着这个特定的序列不起作用，在第一个读取/读取行对之后立即产生一个虚假的“文件结尾”（.read 返回一个空字符串）。

暂时不确定 Python 标准库中的这个错误是否在 Python 2.7 中得到修复——我要去检查一下。

编辑：刚刚检查过——该错误已在 Python 2.7 rc1（当前最新的 2.7 版本的候选版本）中修复。我还不知道它是否也在最新的错误修复版本 2.6 中得到修复。

再次编辑：该错误在 Python 2.6.5（Python 2.6 的最新错误修复版本）中仍然存在 - 因此，如果您无法升级到 2.7 并且需要来自 < code>ZipFile.open，2.7 修复程序的向后移植似乎是唯一可行的解决方案。

请注意，不确定您是否需要行为更好的伪文件对象；如果您控制转储调用并且可以使用最新最好的协议，那么一切都会好起来的：

>>> zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
>>> zf.writestr('data.pkl', cPickle.dumps(some_data, -1))
>>> sd2 = cPickle.load(zf.open('data.pkl'))
>>>

只有旧的、粗糙的向后兼容的“协议 0”（默认）在混合读取和读取行调用时需要正确的伪文件对象行为load （协议 0 也较慢，并且会产生较大的 pickles，因此绝对不建议使用它，除非必须向后兼容旧的 Python 版本，或者 0 生成的 pickles 的纯 ascii 性质是强制性的您的应用程序中的限制）。

It's due to an imperfection in the pseudofile object implemented by the zipfile module (for the .open method of the ZipFile class introduced in Python 2.6). Consider:

>>> f = zf.open('data.pkl')
>>> f.read(1)
'('
>>> f.readline()
'dp1\n'
>>> f.read(1)
''
>>>

the sequence of .read(1) -- .readline() is what .loads internally does (on a protocol-0 pickle, the default in Python 2, which is what you're using here). Unfortunately zipfile's imperfection means this particular sequence doesn't work, producing a spurious "end of file" (.read returning an empty string) right after the first read/readline pair.

Not sure offhand if this bug in Python's standard library is fixed in Python 2.7 -- I'm going to check.

Edit: just checked -- the bug is fixed in Python 2.7 rc1 (the release candidate that's currently the latest 2.7 version). I don't yet know whether it's fixed in the latest bug-fix release of 2.6 as well.

Edit again: the bug is still there in Python 2.6.5, the latest bug-fix release of Python 2.6 -- so if you can't upgrade to 2.7 and need better-behaving pseudofile objects from ZipFile.open, a backport of the 2.7 fix seems the only viable solution.

Note that it's not certain you do need better-behaving pseudofile objects; if you control the dump calls and can use the latest-and-greatest protocol, everything will be fine:

>>> zf = zipfile.ZipFile('zipped_pickle.zip', 'w', zipfile.ZIP_DEFLATED)
>>> zf.writestr('data.pkl', cPickle.dumps(some_data, -1))
>>> sd2 = cPickle.load(zf.open('data.pkl'))
>>>

it's only old crufty backwards-compatible "protocol 0" (the default) that requires proper pseudofile object behavior when mixing read and readline calls in the load (protocol 0 is also slower, and results in larger pickles, so it's definitely not recommended unless backwards compatibility with old Python versions, or the ascii-only nature of the pickles that 0 produces, are mandatory constraints in your application).

回复收藏 0 原文

~没有更多了~