Python:用一些不可腌制的项目腌制一个字典

发布于 2024-09-30 15:41:04 字数 459 浏览 10 评论 0原文

我有一个对象 gui_project ,它有一个属性 .namespace ,它是一个命名空间字典。 (即从字符串到对象的字典。)

(这在类似 IDE 的程序中使用,让用户在 Python shell 中定义自己的对象。)

我想 pickle 这个 gui_project 以及命名空间。问题是,命名空间中的某些对象(即 .namespace 字典的值)不是可挑选的对象。例如,其中一些引用了 wxPython 小部件。

我想过滤掉不可腌制的对象,即将它们从腌制版本中排除。

我该怎么做?

(我尝试过的一件事是逐一处理这些值并尝试腌制它们,但是发生了一些无限递归,我需要避免这种情况。)

(我确实实现了一个 GuiProject.__getstate__ 现在的方法,摆脱除 namespace 之外的其他不可挑选的东西。)

I have an object gui_project which has an attribute .namespace, which is a namespace dict. (i.e. a dict from strings to objects.)

(This is used in an IDE-like program to let the user define his own object in a Python shell.)

I want to pickle this gui_project, along with the namespace. Problem is, some objects in the namespace (i.e. values of the .namespace dict) are not picklable objects. For example, some of them refer to wxPython widgets.

I'd like to filter out the unpicklable objects, that is, exclude them from the pickled version.

How can I do this?

(One thing I tried is to go one by one on the values and try to pickle them, but some infinite recursion happened, and I need to be safe from that.)

(I do implement a GuiProject.__getstate__ method right now, to get rid of other unpicklable stuff besides namespace.)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

我纯我任性 2024-10-07 15:41:04

我将使用pickler对持久对象引用的记录支持。持久对象引用是由 pickle 引用但未存储在 pickle 中的对象。

http://docs.python.org/library/pickle .html#pickling-and-unpickling-external-objects

ZODB 使用此 API 已有多年,因此非常稳定。当 unpickle 时,您可以将对象引用替换为您喜欢的任何内容。在您的情况下,您可能希望将对象引用替换为指示对象无法被腌制的标记。

您可以从这样的事情开始(未经测试):

import cPickle

def persistent_id(obj):
    if isinstance(obj, wxObject):
        return "filtered:wxObject"
    else:
        return None

class FilteredObject:
    def __init__(self, about):
        self.about = about
    def __repr__(self):
        return 'FilteredObject(%s)' % repr(self.about)

def persistent_load(obj_id):
    if obj_id.startswith('filtered:'):
        return FilteredObject(obj_id[9:])
    else:
        raise cPickle.UnpicklingError('Invalid persistent id')

def dump_filtered(obj, file):
    p = cPickle.Pickler(file)
    p.persistent_id = persistent_id
    p.dump(obj)

def load_filtered(file)
    u = cPickle.Unpickler(file)
    u.persistent_load = persistent_load
    return u.load()

然后只需调用 dump_filtered() 和 load_filtered() 而不是 pickle.dump() 和 pickle.load()。 wxPython 对象将被pickle为持久ID,并在unpickle时被FilteredObjects替换。

您可以通过过滤掉不属于内置类型且没有 __getstate__ 方法的对象来使解决方案更加通用。

更新(2010 年 11 月 15 日):这是一种使用包装类实现相同目标的方法。使用包装类而不是子类,可以保留在记录的 API 中。

from cPickle import Pickler, Unpickler, UnpicklingError


class FilteredObject:
    def __init__(self, about):
        self.about = about
    def __repr__(self):
        return 'FilteredObject(%s)' % repr(self.about)


class MyPickler(object):

    def __init__(self, file, protocol=0):
        pickler = Pickler(file, protocol)
        pickler.persistent_id = self.persistent_id
        self.dump = pickler.dump
        self.clear_memo = pickler.clear_memo

    def persistent_id(self, obj):
        if not hasattr(obj, '__getstate__') and not isinstance(obj,
            (basestring, int, long, float, tuple, list, set, dict)):
            return "filtered:%s" % type(obj)
        else:
            return None


class MyUnpickler(object):

    def __init__(self, file):
        unpickler = Unpickler(file)
        unpickler.persistent_load = self.persistent_load
        self.load = unpickler.load
        self.noload = unpickler.noload

    def persistent_load(self, obj_id):
        if obj_id.startswith('filtered:'):
            return FilteredObject(obj_id[9:])
        else:
            raise UnpicklingError('Invalid persistent id')


if __name__ == '__main__':
    from cStringIO import StringIO

    class UnpickleableThing(object):
        pass

    f = StringIO()
    p = MyPickler(f)
    p.dump({'a': 1, 'b': UnpickleableThing()})

    f.seek(0)
    u = MyUnpickler(f)
    obj = u.load()
    print obj

    assert obj['a'] == 1
    assert isinstance(obj['b'], FilteredObject)
    assert obj['b'].about

I would use the pickler's documented support for persistent object references. Persistent object references are objects that are referenced by the pickle but not stored in the pickle.

http://docs.python.org/library/pickle.html#pickling-and-unpickling-external-objects

ZODB has used this API for years, so it's very stable. When unpickling, you can replace the object references with anything you like. In your case, you would want to replace the object references with markers indicating that the objects could not be pickled.

You could start with something like this (untested):

import cPickle

def persistent_id(obj):
    if isinstance(obj, wxObject):
        return "filtered:wxObject"
    else:
        return None

class FilteredObject:
    def __init__(self, about):
        self.about = about
    def __repr__(self):
        return 'FilteredObject(%s)' % repr(self.about)

def persistent_load(obj_id):
    if obj_id.startswith('filtered:'):
        return FilteredObject(obj_id[9:])
    else:
        raise cPickle.UnpicklingError('Invalid persistent id')

def dump_filtered(obj, file):
    p = cPickle.Pickler(file)
    p.persistent_id = persistent_id
    p.dump(obj)

def load_filtered(file)
    u = cPickle.Unpickler(file)
    u.persistent_load = persistent_load
    return u.load()

Then just call dump_filtered() and load_filtered() instead of pickle.dump() and pickle.load(). wxPython objects will be pickled as persistent IDs, to be replaced with FilteredObjects at unpickling time.

You could make the solution more generic by filtering out objects that are not of the built-in types and have no __getstate__ method.

Update (15 Nov 2010): Here is a way to achieve the same thing with wrapper classes. Using wrapper classes instead of subclasses, it's possible to stay within the documented API.

from cPickle import Pickler, Unpickler, UnpicklingError


class FilteredObject:
    def __init__(self, about):
        self.about = about
    def __repr__(self):
        return 'FilteredObject(%s)' % repr(self.about)


class MyPickler(object):

    def __init__(self, file, protocol=0):
        pickler = Pickler(file, protocol)
        pickler.persistent_id = self.persistent_id
        self.dump = pickler.dump
        self.clear_memo = pickler.clear_memo

    def persistent_id(self, obj):
        if not hasattr(obj, '__getstate__') and not isinstance(obj,
            (basestring, int, long, float, tuple, list, set, dict)):
            return "filtered:%s" % type(obj)
        else:
            return None


class MyUnpickler(object):

    def __init__(self, file):
        unpickler = Unpickler(file)
        unpickler.persistent_load = self.persistent_load
        self.load = unpickler.load
        self.noload = unpickler.noload

    def persistent_load(self, obj_id):
        if obj_id.startswith('filtered:'):
            return FilteredObject(obj_id[9:])
        else:
            raise UnpicklingError('Invalid persistent id')


if __name__ == '__main__':
    from cStringIO import StringIO

    class UnpickleableThing(object):
        pass

    f = StringIO()
    p = MyPickler(f)
    p.dump({'a': 1, 'b': UnpickleableThing()})

    f.seek(0)
    u = MyUnpickler(f)
    obj = u.load()
    print obj

    assert obj['a'] == 1
    assert isinstance(obj['b'], FilteredObject)
    assert obj['b'].about
吻安 2024-10-07 15:41:04

这就是我要做的事情(我之前做了类似的事情并且它有效):

  1. 编写一个函数来确定一个对象是否是可pickleable的
  2. 根据上面的函数创建所有可pickle变量的列表
  3. 创建一个新字典(称为D) 存储 D 中每个变量的所有不可 pickleable 变量
  4. (仅当 d 中具有非常相似的变量时才有效)
    制作一个字符串列表,其中每个字符串都是合法的 python 代码,这样
    当所有这些字符串按顺序执行时,您将获得所需的变量

现在,当您取消pickle时,您将返回所有最初可pickle的变量。对于所有不可 pickle 的变量,您现在拥有一个字符串列表(合法的 python 代码),按顺序执行时,将为您提供所需的变量。

希望这有帮助

This is how I would do this (I did something similar before and it worked):

  1. Write a function that determines whether or not an object is pickleable
  2. Make a list of all the pickleable variables, based on the above function
  3. Make a new dictionary (called D) that stores all the non-pickleable variables
  4. For each variable in D (this only works if you have very similar variables in d)
    make a list of strings, where each string is legal python code, such that
    when all these strings are executed in order, you get the desired variable

Now, when you unpickle, you get back all the variables that were originally pickleable. For all variables that were not pickleable, you now have a list of strings (legal python code) that when executed in order, gives you the desired variable.

Hope this helps

养猫人 2024-10-07 15:41:04

我最终使用 Shane Hathaway 的方法编写了自己的解决方案。

这是代码。 (查找 CutePicklerCuteUnpickler。)这是测试。它是 GarlicSim 的一部分,因此您可以通过 安装 garlicsim 并执行 from Garlicsim.general_misc import pickle_tools

如果您想在 Python 3 代码上使用它,请使用 Python 3 fork of garlicsim

I ended up coding my own solution to this, using Shane Hathaway's approach.

Here's the code. (Look for CutePickler and CuteUnpickler.) Here are the tests. It's part of GarlicSim, so you can use it by installing garlicsim and doing from garlicsim.general_misc import pickle_tools.

If you want to use it on Python 3 code, use the Python 3 fork of garlicsim.

执手闯天涯 2024-10-07 15:41:04

一种方法是继承 pickle.Pickler,并重写 save_dict() 方法。从基类中复制它,其内容如下:

def save_dict(self, obj):
    write = self.write

    if self.bin:
        write(EMPTY_DICT)
    else:   # proto 0 -- can't use EMPTY_DICT
        write(MARK + DICT)

    self.memoize(obj)
    self._batch_setitems(obj.iteritems())

但是,在 _batch_setitems 中,传递一个迭代器来过滤掉您不想转储的所有项目,例如,

def save_dict(self, obj):
    write = self.write

    if self.bin:
        write(EMPTY_DICT)
    else:   # proto 0 -- can't use EMPTY_DICT
        write(MARK + DICT)

    self.memoize(obj)
    self._batch_setitems(item for item in obj.iteritems() 
                         if not isinstance(item[1], bad_type))

由于 save_dict 不是官方 API,因此您需要检查对于每个新的 Python 版本,此覆盖是否仍然正确。

One approach would be to inherit from pickle.Pickler, and override the save_dict() method. Copy it from the base class, which reads like this:

def save_dict(self, obj):
    write = self.write

    if self.bin:
        write(EMPTY_DICT)
    else:   # proto 0 -- can't use EMPTY_DICT
        write(MARK + DICT)

    self.memoize(obj)
    self._batch_setitems(obj.iteritems())

However, in the _batch_setitems, pass an iterator that filters out all items that you don't want to be dumped, e.g

def save_dict(self, obj):
    write = self.write

    if self.bin:
        write(EMPTY_DICT)
    else:   # proto 0 -- can't use EMPTY_DICT
        write(MARK + DICT)

    self.memoize(obj)
    self._batch_setitems(item for item in obj.iteritems() 
                         if not isinstance(item[1], bad_type))

As save_dict isn't an official API, you need to check for each new Python version whether this override is still correct.

攀登最高峰 2024-10-07 15:41:04

过滤部分确实很棘手。使用简单的技巧,您可以轻松地让泡菜发挥作用。但是,您最终可能会过滤掉太多内容,并丢失当过滤器看起来更深入时可以保留的信息。但是,.namespace 中最终出现的事物的可能性很大,这使得构建一个好的过滤器变得困难。

但是,我们可以利用 Python 中已有的部分,例如 copy 模块中的 deepcopy

我制作了 Stock copy 模块的副本,并执行了以下操作:

  1. 创建一个名为 LostObject 的新类型来表示将在酸洗中丢失的对象。
  2. 更改 _deepcopy_atomic 以确保 x 是可挑选的。如果不是,则返回 LostObject
  3. 对象的实例,可以定义方法 __reduce__ 和/或 __reduce_ex__ 来提供有关是否以及如何对其进行 pickle 的提示。我们确保这些方法不会抛出异常以提供无法对其进行腌制的提示。
  4. 为了避免对大对象进行不必要的复制(a la实际的深度复制),我们递归地检查对象是否是可picklable的,并且只制作不可picklable的部分。例如,对于可挑选列表和不可挑选对象的元组,我们将复制该元组 - 只是容器 - 但不复制其成员列表。

以下是差异:

[~/Development/scratch/] $ diff -uN  /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/copy.py mcopy.py
--- /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/copy.py  2010-01-09 00:18:38.000000000 -0800
+++ mcopy.py    2010-11-10 08:50:26.000000000 -0800
@@ -157,6 +157,13 @@

     cls = type(x)

+    # if x is picklable, there is no need to make a new copy, just ref it
+    try:
+        dumps(x)
+        return x
+    except TypeError:
+        pass
+
     copier = _deepcopy_dispatch.get(cls)
     if copier:
         y = copier(x, memo)
@@ -179,10 +186,18 @@
                     reductor = getattr(x, "__reduce_ex__", None)
                     if reductor:
                         rv = reductor(2)
+                        try:
+                            x.__reduce_ex__()
+                        except TypeError:
+                            rv = LostObject, tuple()
                     else:
                         reductor = getattr(x, "__reduce__", None)
                         if reductor:
                             rv = reductor()
+                            try:
+                                x.__reduce__()
+                            except TypeError:
+                                rv = LostObject, tuple()
                         else:
                             raise Error(
                                 "un(deep)copyable object of type %s" % cls)
@@ -194,7 +209,12 @@

 _deepcopy_dispatch = d = {}

+from pickle import dumps
+class LostObject(object): pass
 def _deepcopy_atomic(x, memo):
+    try:
+        dumps(x)
+    except TypeError: return LostObject()
     return x
 d[type(None)] = _deepcopy_atomic
 d[type(Ellipsis)] = _deepcopy_atomic

现在回到酸洗部分。您只需使用这个新的 deepcopy 函数进行深度复制,然后 pickle 副本即可。不可酸洗的部分已在复制过程中被移除。

x = dict(a=1)
xx = dict(x=x)
x['xx'] = xx
x['f'] = file('/tmp/1', 'w')
class List():
    def __init__(self, *args, **kwargs):
        print 'making a copy of a list'
        self.data = list(*args, **kwargs)
x['large'] = List(range(1000))
# now x contains a loop and a unpickable file object
# the following line will throw
from pickle import dumps, loads
try:
    dumps(x)
except TypeError:
    print 'yes, it throws'

def check_picklable(x):
    try:
        dumps(x)
    except TypeError:
        return False
    return True

class LostObject(object): pass

from mcopy import deepcopy

# though x has a big List object, this deepcopy will not make a new copy of it
c = deepcopy(x)
dumps(c)
cc = loads(dumps(c))
# check loop refrence
if cc['xx']['x'] == cc:
    print 'yes, loop reference is preserved'
# check unpickable part
if isinstance(cc['f'], LostObject):
    print 'unpicklable part is now an instance of LostObject'
# check large object
if loads(dumps(c))['large'].data[999] == x['large'].data[999]:
    print 'large object is ok'

这是输出:

making a copy of a list
yes, it throws
yes, loop reference is preserved
unpicklable part is now an instance of LostObject
large object is ok

您会看到 1) 相互指针(在 xxx 之间)被保留,并且我们不会遇到无限循环; 2)不可picklable文件对象被转换为LostObject实例; 3) 不会创建大对象的新副本,因为它是可picklable的。

The filtering part is indeed tricky. Using simple tricks, you can easily get the pickle to work. However, you might end up filtering out too much and losing information that you could keep when the filter looks a little bit deeper. But the vast possibility of things that can end up in the .namespace makes building a good filter difficult.

However, we could leverage pieces that are already part of Python, such as deepcopy in the copy module.

I made a copy of the stock copy module, and did the following things:

  1. create a new type named LostObject to represent object that will be lost in pickling.
  2. change _deepcopy_atomic to make sure x is picklable. If it's not, return an instance of LostObject
  3. objects can define methods __reduce__ and/or __reduce_ex__ to provide hint about whether and how to pickle it. We make sure these methods will not throw exception to provide hint that it cannot be pickled.
  4. to avoid making unnecessary copy of big object (a la actual deepcopy), we recursively check whether an object is picklable, and only make unpicklable part. For instance, for a tuple of a picklable list and and an unpickable object, we will make a copy of the tuple - just the container - but not its member list.

The following is the diff:

[~/Development/scratch/] $ diff -uN  /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/copy.py mcopy.py
--- /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/copy.py  2010-01-09 00:18:38.000000000 -0800
+++ mcopy.py    2010-11-10 08:50:26.000000000 -0800
@@ -157,6 +157,13 @@

     cls = type(x)

+    # if x is picklable, there is no need to make a new copy, just ref it
+    try:
+        dumps(x)
+        return x
+    except TypeError:
+        pass
+
     copier = _deepcopy_dispatch.get(cls)
     if copier:
         y = copier(x, memo)
@@ -179,10 +186,18 @@
                     reductor = getattr(x, "__reduce_ex__", None)
                     if reductor:
                         rv = reductor(2)
+                        try:
+                            x.__reduce_ex__()
+                        except TypeError:
+                            rv = LostObject, tuple()
                     else:
                         reductor = getattr(x, "__reduce__", None)
                         if reductor:
                             rv = reductor()
+                            try:
+                                x.__reduce__()
+                            except TypeError:
+                                rv = LostObject, tuple()
                         else:
                             raise Error(
                                 "un(deep)copyable object of type %s" % cls)
@@ -194,7 +209,12 @@

 _deepcopy_dispatch = d = {}

+from pickle import dumps
+class LostObject(object): pass
 def _deepcopy_atomic(x, memo):
+    try:
+        dumps(x)
+    except TypeError: return LostObject()
     return x
 d[type(None)] = _deepcopy_atomic
 d[type(Ellipsis)] = _deepcopy_atomic

Now back to the pickling part. You simply make a deepcopy using this new deepcopy function and then pickle the copy. The unpicklable parts have been removed during the copying process.

x = dict(a=1)
xx = dict(x=x)
x['xx'] = xx
x['f'] = file('/tmp/1', 'w')
class List():
    def __init__(self, *args, **kwargs):
        print 'making a copy of a list'
        self.data = list(*args, **kwargs)
x['large'] = List(range(1000))
# now x contains a loop and a unpickable file object
# the following line will throw
from pickle import dumps, loads
try:
    dumps(x)
except TypeError:
    print 'yes, it throws'

def check_picklable(x):
    try:
        dumps(x)
    except TypeError:
        return False
    return True

class LostObject(object): pass

from mcopy import deepcopy

# though x has a big List object, this deepcopy will not make a new copy of it
c = deepcopy(x)
dumps(c)
cc = loads(dumps(c))
# check loop refrence
if cc['xx']['x'] == cc:
    print 'yes, loop reference is preserved'
# check unpickable part
if isinstance(cc['f'], LostObject):
    print 'unpicklable part is now an instance of LostObject'
# check large object
if loads(dumps(c))['large'].data[999] == x['large'].data[999]:
    print 'large object is ok'

Here is the output:

making a copy of a list
yes, it throws
yes, loop reference is preserved
unpicklable part is now an instance of LostObject
large object is ok

You see that 1) mutual pointers (between x and xx) are preserved and we do not run into infinite loop; 2) the unpicklable file object is converted to a LostObject instance; and 3) not new copy of the large object is created since it is picklable.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文