Python pickle - 它是如何崩溃的?

发布于 2024-10-01 03:34:28 字数 1262 浏览 6 评论 0原文

每个人都知道 pickle 不是存储用户数据的安全方式。盒子上也这么写。

我正在寻找在当前支持的 cPython >= 2.4 版本中破坏 pickle 解析的字符串或数据结构的示例。有没有可以腌制但不能解腌的东西?特定的 unicode 字符是否存在问题?真的是大数据结构吗?显然旧的 ASCII 协议存在一些问题,但是最新的二进制形式又如何呢?

我特别好奇 pickle loads 操作可能失败的方式,特别是当给定 pickle 本身生成的字符串时。在某些情况下,pickle 会继续解析 . 吗?

存在哪些类型的边缘情况?

编辑:以下是我正在寻找的一些示例:

Everyone knows pickle is not a secure way to store user data. It even says so on the box.

I'm looking for examples of strings or data structures that break pickle parsing in the current supported versions of cPython >= 2.4. Are there things that can be pickled but not unpickled? Are there problems with particular unicode characters? Really big data structures? Obviously the old ASCII protocol has some issues, but what about the most current binary form?

I'm particularly curious about ways in which the pickle loads operation can fail, especially when given a string produced by pickle itself. Are there any circumstances in which pickle will continue parsing past the .?

What sort of edge cases are there?

Edit: Here are some examples of the sort of thing I'm looking for:

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

┾廆蒐ゝ 2024-10-08 03:34:28

这是一个非常简单的例子,说明了 pickle 不喜欢我的数据结构。

import cPickle as pickle

class Member(object):
    def __init__(self, key):
        self.key = key
        self.pool = None
    def __hash__(self):
        return self.key

class Pool(object):
    def __init__(self):
        self.members = set()
    def add_member(self, member):
        self.members.add(member)
        member.pool = self

member = Member(1)
pool = Pool()
pool.add_member(member)

with open("test.pkl", "w") as f:
    pickle.dump(member, f, pickle.HIGHEST_PROTOCOL)

with open("test.pkl", "r") as f:
    x = pickle.load(f)

众所周知,Pickle 的循环结构有点有趣,但如果你将自定义哈希函数和集合/字典混合在一起,那么事情就会变得非常棘手。

在这个特定的示例中,它部分地取消了成员的pickle,然后遇到了池。因此,它会部分地解开池并遇到成员集。因此它创建了集合并尝试将部分未腌制的成员添加到集合中。此时它会在自定义哈希函数中死亡,因为该成员仅部分 unpickled。我不敢想象如果哈希函数中有“if hasattr...”会发生什么。

$ python --version
Python 2.6.5
$ python test.py
Traceback (most recent call last):
  File "test.py", line 25, in <module>
    x = pickle.load(f)
  File "test.py", line 8, in __hash__
    return self.key
AttributeError: ("'Member' object has no attribute 'key'", <type 'set'>, ([<__main__.Member object at 0xb76cdaac>],))

This is a greatly simplified example of what pickle didn't like about my data structure.

import cPickle as pickle

class Member(object):
    def __init__(self, key):
        self.key = key
        self.pool = None
    def __hash__(self):
        return self.key

class Pool(object):
    def __init__(self):
        self.members = set()
    def add_member(self, member):
        self.members.add(member)
        member.pool = self

member = Member(1)
pool = Pool()
pool.add_member(member)

with open("test.pkl", "w") as f:
    pickle.dump(member, f, pickle.HIGHEST_PROTOCOL)

with open("test.pkl", "r") as f:
    x = pickle.load(f)

Pickle is known to be a little funny with circular structures, but if you toss custom hash functions and sets/dicts into the mix then things get quite hairy.

In this particular example it partially unpickles the member and then encounters the pool. So it then partially unpickles the pool and encounters the members set. So it creates the set and tries to add the partially unpickled member to the set. At which point it dies in the custom hash function, because the member is only partially unpickled. I dread to think what might happen if you had an "if hasattr..." in the hash function.

$ python --version
Python 2.6.5
$ python test.py
Traceback (most recent call last):
  File "test.py", line 25, in <module>
    x = pickle.load(f)
  File "test.py", line 8, in __hash__
    return self.key
AttributeError: ("'Member' object has no attribute 'key'", <type 'set'>, ([<__main__.Member object at 0xb76cdaac>],))
暖伴 2024-10-08 03:34:28

如果您对 pickle(或 cPickle,因为它只是略有不同的导入)失败的情况感兴趣,您可以使用这个不断增长的所有不同对象类型的列表python 来测试相当容易。

https://github.com/uqfoundation/dill/blob/master/dill /_objects.py

dill 包含发现对象如何无法 pickle 的函数,例如通过捕获它抛出的错误并将其返回给用户。

dill.dill 具有这些函数,您也可以为 picklecPickle 构建这些函数,只需通过剪切和粘贴和 >import pickleimport cPickle as pickle (或 import dill as pickle):

def copy(obj, *args, **kwds):
    """use pickling to 'copy' an object"""
    return loads(dumps(obj, *args, **kwds))


# quick sanity checking
def pickles(obj,exact=False,safe=False,**kwds):
    """quick check if object pickles with dill"""
    if safe: exceptions = (Exception,) # RuntimeError, ValueError
    else:
        exceptions = (TypeError, AssertionError, PicklingError, UnpicklingError)
    try:
        pik = copy(obj, **kwds)
        try:
            result = bool(pik.all() == obj.all())
        except AttributeError:
            result = pik == obj
        if result: return True
        if not exact:
            return type(pik) == type(obj)
        return False
    except exceptions:
        return False

并将这些包含在 dill.detect 中:

def baditems(obj, exact=False, safe=False): #XXX: obj=globals() ?
    """get items in object that fail to pickle"""
    if not hasattr(obj,'__iter__'): # is not iterable
        return [j for j in (badobjects(obj,0,exact,safe),) if j is not None]
    obj = obj.values() if getattr(obj,'values',None) else obj
    _obj = [] # can't use a set, as items may be unhashable
    [_obj.append(badobjects(i,0,exact,safe)) for i in obj if i not in _obj]
    return [j for j in _obj if j is not None]


def badobjects(obj, depth=0, exact=False, safe=False):
    """get objects that fail to pickle"""
    if not depth:
        if pickles(obj,exact,safe): return None
        return obj
    return dict(((attr, badobjects(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))

def badtypes(obj, depth=0, exact=False, safe=False):
    """get types for objects that fail to pickle"""
    if not depth:
        if pickles(obj,exact,safe): return None
        return type(obj)
    return dict(((attr, badtypes(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))

以及最后一个函数,您可以使用它来测试 dill._objects 中的对象

def errors(obj, depth=0, exact=False, safe=False):
    """get errors for objects that fail to pickle"""
    if not depth:
        try:
            pik = copy(obj)
            if exact:
                assert pik == obj, \
                    "Unpickling produces %s instead of %s" % (pik,obj)
            assert type(pik) == type(obj), \
                "Unpickling produces %s instead of %s" % (type(pik),type(obj))
            return None
        except Exception:
            import sys
            return sys.exc_info()[1]
    return dict(((attr, errors(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))

If you are interested in how things fail with pickle (or cPickle, as it's just a slightly different import), you can use this growing list of all the different object types in python to test against fairly easily.

https://github.com/uqfoundation/dill/blob/master/dill/_objects.py

The package dill includes functions that discover how an object fails to pickle, for example by catching the error it throws and returning it to the user.

dill.dill has these functions, which you could also build for pickle or cPickle, simply with a cut-and-paste and an import pickle or import cPickle as pickle (or import dill as pickle):

def copy(obj, *args, **kwds):
    """use pickling to 'copy' an object"""
    return loads(dumps(obj, *args, **kwds))


# quick sanity checking
def pickles(obj,exact=False,safe=False,**kwds):
    """quick check if object pickles with dill"""
    if safe: exceptions = (Exception,) # RuntimeError, ValueError
    else:
        exceptions = (TypeError, AssertionError, PicklingError, UnpicklingError)
    try:
        pik = copy(obj, **kwds)
        try:
            result = bool(pik.all() == obj.all())
        except AttributeError:
            result = pik == obj
        if result: return True
        if not exact:
            return type(pik) == type(obj)
        return False
    except exceptions:
        return False

and includes these in dill.detect:

def baditems(obj, exact=False, safe=False): #XXX: obj=globals() ?
    """get items in object that fail to pickle"""
    if not hasattr(obj,'__iter__'): # is not iterable
        return [j for j in (badobjects(obj,0,exact,safe),) if j is not None]
    obj = obj.values() if getattr(obj,'values',None) else obj
    _obj = [] # can't use a set, as items may be unhashable
    [_obj.append(badobjects(i,0,exact,safe)) for i in obj if i not in _obj]
    return [j for j in _obj if j is not None]


def badobjects(obj, depth=0, exact=False, safe=False):
    """get objects that fail to pickle"""
    if not depth:
        if pickles(obj,exact,safe): return None
        return obj
    return dict(((attr, badobjects(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))

def badtypes(obj, depth=0, exact=False, safe=False):
    """get types for objects that fail to pickle"""
    if not depth:
        if pickles(obj,exact,safe): return None
        return type(obj)
    return dict(((attr, badtypes(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))

and this last function, which is what you can use to test the objects in dill._objects

def errors(obj, depth=0, exact=False, safe=False):
    """get errors for objects that fail to pickle"""
    if not depth:
        try:
            pik = copy(obj)
            if exact:
                assert pik == obj, \
                    "Unpickling produces %s instead of %s" % (pik,obj)
            assert type(pik) == type(obj), \
                "Unpickling produces %s instead of %s" % (type(pik),type(obj))
            return None
        except Exception:
            import sys
            return sys.exc_info()[1]
    return dict(((attr, errors(getattr(obj,attr),depth-1,exact,safe)) \
           for attr in dir(obj) if not pickles(getattr(obj,attr),exact,safe)))
等待我真够勒 2024-10-08 03:34:28

可以腌制类实例。如果我知道你的应用程序使用什么类,那么我就可以破坏它们。一个人为的示例:

import subprocess

class Command(object):
    def __init__(self, command):
        self._command = self._sanitize(command)

    @staticmethod
    def _sanitize(command):
        return filter(lambda c: c in string.letters, command)

    def run(self):
        subprocess.call('/usr/lib/myprog/%s' % self._command, shell=True)

现在,如果您的程序创建 Command 实例并使用 pickle 保存它们,并且我可以颠覆或注入该存储,那么我可以通过设置 self._command 运行我选择的任何命令直接。

实际上,我的示例无论如何都不应该被视为安全代码。但请注意,如果 sanitize 函数是安全的,那么整个类也是安全的,除了可能使用来自不受信任的数据的 pickle 来破坏这一点。因此,存在一些安全的程序,但由于pickle 的不当使用而变得不安全。

危险在于,使用 pickle 的代码可能会按照相同的原理被破坏,但在看似无辜的代码中,漏洞远不那么明显。最好的办法是始终避免使用 pickle 加载不受信任的数据。

It is possible to pickle class instances. If I knew what classes your application uses, then I could subvert them. A contrived example:

import subprocess

class Command(object):
    def __init__(self, command):
        self._command = self._sanitize(command)

    @staticmethod
    def _sanitize(command):
        return filter(lambda c: c in string.letters, command)

    def run(self):
        subprocess.call('/usr/lib/myprog/%s' % self._command, shell=True)

Now if your program creates Command instances and saves them using pickle, and I could subvert or inject into that storage, then I could run any command I choose by setting self._command directly.

In practice my example should never pass for secure code anyway. But note that if the sanitize function is secure, then so is the entire class, apart from the possible use of pickle from untrusted data breaking this. Therefore, there exist programs which are secure but can be made insecure by the inappropriate use of pickle.

The danger is that your pickle-using code could be subverted along the same principle but in innocent-looking code where the vulnerability is far less obvious. The best thing to do is to always avoid using pickle to load untrusted data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文