Python pickle - 它是如何崩溃的?
每个人都知道 pickle 不是存储用户数据的安全方式。盒子上也这么写。
我正在寻找在当前支持的 cPython >= 2.4 版本中破坏 pickle 解析的字符串或数据结构的示例。有没有可以腌制但不能解腌的东西?特定的 unicode 字符是否存在问题?真的是大数据结构吗?显然旧的 ASCII 协议存在一些问题,但是最新的二进制形式又如何呢?
我特别好奇 pickle loads
操作可能失败的方式,特别是当给定 pickle 本身生成的字符串时。在某些情况下,pickle 会继续解析 .
吗?
存在哪些类型的边缘情况?
编辑:以下是我正在寻找的一些示例:
- 在Python 2.4中,您可以毫无错误地pickle数组,但无法取消pickle它。 http://bugs.python.org/issue1281383
- 您无法可靠地pickle从 dict 继承的对象在使用
__setstate__
设置实例变量之前调用__setitem__
。当酸洗 Cookie 对象时,这可能是一个陷阱。请参阅 http://bugs.python.org/issue964868 和 http://bugs.python.org/issue826897 - Python 2.4(和2.5?)将返回无穷大的pickle值(或接近它的值,例如1e100000 ),但加载时可能(取决于平台)失败。请参阅 http://bugs.python.org/issue880990 和 http://bugs.python.org/issue445484
- 最后一项很有趣,因为它揭示了
STOP
标记的情况实际上并不停止解析 - 当标记作为文字的一部分存在时,或者更一般地说,当前面没有换行符时。
Everyone knows pickle is not a secure way to store user data. It even says so on the box.
I'm looking for examples of strings or data structures that break pickle parsing in the current supported versions of cPython >= 2.4
. Are there things that can be pickled but not unpickled? Are there problems with particular unicode characters? Really big data structures? Obviously the old ASCII protocol has some issues, but what about the most current binary form?
I'm particularly curious about ways in which the pickle loads
operation can fail, especially when given a string produced by pickle itself. Are there any circumstances in which pickle will continue parsing past the .
?
What sort of edge cases are there?
Edit: Here are some examples of the sort of thing I'm looking for:
- In Python 2.4, you can pickle an array without error, but you can't unpickle it. http://bugs.python.org/issue1281383
- You can't reliably pickle objects that inherit from dict and call
__setitem__
before instance variables are set with__setstate__
. This can be a gotcha when pickling Cookie objects. See http://bugs.python.org/issue964868 and http://bugs.python.org/issue826897 - Python 2.4 (and 2.5?) will return a pickle value for infinity (or values close to it like 1e100000), but may (depending on platform) fail when loading. See http://bugs.python.org/issue880990 and http://bugs.python.org/issue445484
- This last item is interesting because it reveals a case where the
STOP
marker does not actually stop parsing - when the marker exists as part of a literal, or more generally, when not preceded by a newline.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这是一个非常简单的例子,说明了 pickle 不喜欢我的数据结构。
众所周知,Pickle 的循环结构有点有趣,但如果你将自定义哈希函数和集合/字典混合在一起,那么事情就会变得非常棘手。
在这个特定的示例中,它部分地取消了成员的pickle,然后遇到了池。因此,它会部分地解开池并遇到成员集。因此它创建了集合并尝试将部分未腌制的成员添加到集合中。此时它会在自定义哈希函数中死亡,因为该成员仅部分 unpickled。我不敢想象如果哈希函数中有“if hasattr...”会发生什么。
This is a greatly simplified example of what pickle didn't like about my data structure.
Pickle is known to be a little funny with circular structures, but if you toss custom hash functions and sets/dicts into the mix then things get quite hairy.
In this particular example it partially unpickles the member and then encounters the pool. So it then partially unpickles the pool and encounters the members set. So it creates the set and tries to add the partially unpickled member to the set. At which point it dies in the custom hash function, because the member is only partially unpickled. I dread to think what might happen if you had an "if hasattr..." in the hash function.
如果您对
pickle
(或cPickle
,因为它只是略有不同的导入)失败的情况感兴趣,您可以使用这个不断增长的所有不同对象类型的列表python 来测试相当容易。https://github.com/uqfoundation/dill/blob/master/dill /_objects.py
包
dill
包含发现对象如何无法 pickle 的函数,例如通过捕获它抛出的错误并将其返回给用户。dill.dill
具有这些函数,您也可以为pickle
或cPickle
构建这些函数,只需通过剪切和粘贴和>import pickle
或import cPickle as pickle
(或import dill as pickle
):并将这些包含在
dill.detect
中:以及最后一个函数,您可以使用它来测试
dill._objects
中的对象If you are interested in how things fail with
pickle
(orcPickle
, as it's just a slightly different import), you can use this growing list of all the different object types in python to test against fairly easily.https://github.com/uqfoundation/dill/blob/master/dill/_objects.py
The package
dill
includes functions that discover how an object fails to pickle, for example by catching the error it throws and returning it to the user.dill.dill
has these functions, which you could also build forpickle
orcPickle
, simply with a cut-and-paste and animport pickle
orimport cPickle as pickle
(orimport dill as pickle
):and includes these in
dill.detect
:and this last function, which is what you can use to test the objects in
dill._objects
可以腌制类实例。如果我知道你的应用程序使用什么类,那么我就可以破坏它们。一个人为的示例:
现在,如果您的程序创建
Command
实例并使用 pickle 保存它们,并且我可以颠覆或注入该存储,那么我可以通过设置self._command 运行我选择的任何命令直接。
实际上,我的示例无论如何都不应该被视为安全代码。但请注意,如果
sanitize
函数是安全的,那么整个类也是安全的,除了可能使用来自不受信任的数据的 pickle 来破坏这一点。因此,存在一些安全的程序,但由于pickle 的不当使用而变得不安全。危险在于,使用 pickle 的代码可能会按照相同的原理被破坏,但在看似无辜的代码中,漏洞远不那么明显。最好的办法是始终避免使用 pickle 加载不受信任的数据。
It is possible to pickle class instances. If I knew what classes your application uses, then I could subvert them. A contrived example:
Now if your program creates
Command
instances and saves them using pickle, and I could subvert or inject into that storage, then I could run any command I choose by settingself._command
directly.In practice my example should never pass for secure code anyway. But note that if the
sanitize
function is secure, then so is the entire class, apart from the possible use of pickle from untrusted data breaking this. Therefore, there exist programs which are secure but can be made insecure by the inappropriate use of pickle.The danger is that your pickle-using code could be subverted along the same principle but in innocent-looking code where the vulnerability is far less obvious. The best thing to do is to always avoid using pickle to load untrusted data.