Python:检查对象是否可以原子方式pickle

发布于 2024-10-02 19:22:15 字数 582 浏览 6 评论 0原文

检查对象是否可以被原子腌制的准确方法是什么?当我说“原子腌制”时,我的意思是不考虑它可能引用的其他对象。例如,这个列表:

l = [threading.Lock()]

不是一个可pickle的对象,因为它引用了一个不可pickle的Lock。但从原子角度来看,这个列表本身是可腌制的。

那么如何检查一个对象是否是原子可腌制的呢? (我猜应该在班级上进行检查,但我不确定。)

我希望它的行为如下:

>>> is_atomically_pickleable(3)
True
>>> is_atomically_pickleable(3.1)
True
>>> is_atomically_pickleable([1, 2, 3])
True
>>> is_atomically_pickleable(threading.Lock())
False
>>> is_atomically_pickleable(open('whatever', 'r'))
False

等等。

What's an accurate way of checking whether an object can be atomically pickled? When I say "atomically pickled", I mean without considering other objects it may refer to. For example, this list:

l = [threading.Lock()]

is not a a pickleable object, because it refers to a Lock which is not pickleable. But atomically, this list itself is pickleable.

So how do you check whether an object is atomically pickleable? (I'm guessing the check should be done on the class, but I'm not sure.)

I want it to behave like this:

>>> is_atomically_pickleable(3)
True
>>> is_atomically_pickleable(3.1)
True
>>> is_atomically_pickleable([1, 2, 3])
True
>>> is_atomically_pickleable(threading.Lock())
False
>>> is_atomically_pickleable(open('whatever', 'r'))
False

Etc.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

柠栀 2024-10-09 19:22:15

鉴于您愿意打破封装,我认为这是您能做的最好的事情:

from pickle import Pickler
import os

class AtomicPickler(Pickler):
  def __init__(self, protocol):
    # You may want to replace this with a fake file object that just
    # discards writes.
    blackhole = open(os.devnull, 'w')

    Pickler.__init__(self, blackhole, protocol)
    self.depth = 0

  def save(self, o):
    self.depth += 1
    if self.depth == 1:
      return Pickler.save(self, o)
    self.depth -= 1
    return

def is_atomically_pickleable(o, protocol=None):
  pickler = AtomicPickler(protocol)
  try:
    pickler.dump(o)
    return True
  except:
    # Hopefully this exception was actually caused by dump(), and not
    # something like a KeyboardInterrupt
    return False

在 Python 中,您判断某件事是否有效的唯一方法就是尝试它。这就是像 Python 这样动态的语言的本质。您的问题的困难在于您想要区分“顶层”的失败和更深层次的失败。

Pickler.save 本质上是 Python 的 pickling 逻辑的控制中心,因此上面创建了一个修改后的 Pickler,它忽略对其 save 方法的递归调用。在顶级保存中引发的任何异常都将被视为酸洗失败。您可能需要向 except 语句添加限定符。 Python 中不合格的 excepts 通常是一个坏主意,因为异常不仅用于程序错误,还用于诸如 KeyboardInterruptSystemExit 之类的情况。

对于具有奇怪的自定义酸洗逻辑的类型,这可以给出可以说是假阴性的结果。例如,如果您创建一个自定义的类似列表的类,它不会导致递归调用 Pickler.save ,而是实际上尝试以某种方式自行腌制其元素,然后创建此类的实例如果包含其自定义逻辑无法 pickle 的元素,则 is_atomically_pickleable 将在该实例中返回 False,即使删除有问题的元素会导致对象可 pickle。

另请注意 is_atomically_pickleable 的协议参数。理论上,当使用不同的协议进行腌制时,对象的行为可能会有所不同(尽管这会很奇怪),因此您应该使其与您提供给 dump 的协议参数相匹配。

Given that you're willing to break encapsulation, I think this is the best you can do:

from pickle import Pickler
import os

class AtomicPickler(Pickler):
  def __init__(self, protocol):
    # You may want to replace this with a fake file object that just
    # discards writes.
    blackhole = open(os.devnull, 'w')

    Pickler.__init__(self, blackhole, protocol)
    self.depth = 0

  def save(self, o):
    self.depth += 1
    if self.depth == 1:
      return Pickler.save(self, o)
    self.depth -= 1
    return

def is_atomically_pickleable(o, protocol=None):
  pickler = AtomicPickler(protocol)
  try:
    pickler.dump(o)
    return True
  except:
    # Hopefully this exception was actually caused by dump(), and not
    # something like a KeyboardInterrupt
    return False

In Python the only way you can tell if something will work is to try it. That's the nature of a language as dynamic as Python. The difficulty with your question is that you want to distinguish between failures at the "top level" and failures at deeper levels.

Pickler.save is essentially the control-center for Python's pickling logic, so the above creates a modified Pickler that ignores recursive calls to its save method. Any exception raised while in the top-level save is treated as a pickling failure. You may want to add qualifiers to the except statement. Unqualified excepts in Python are generally a bad idea as exceptions are used not just for program errors but also for things like KeyboardInterrupt and SystemExit.

This can give what are arguably false negatives for types with odd custom pickling logic. For example, if you create a custom list-like class that instead of causing Pickler.save to be recursively called it actually tried to pickle its elements on its own somehow, and then created an instance of this class that contained an element that its custom logic could not pickle, is_atomically_pickleable would return False for this instance even though removing the offending element would result in an object that was pickleable.

Also, note the protocol argument to is_atomically_pickleable. Theoretically an object could behave differently when pickled with different protocols (though that would be pretty weird) so you should make this match the protocol argument you give to dump.

暮光沉寂 2024-10-09 19:22:15

考虑到 Python 的动态特性,我认为除了启发式方法或白名单之外,没有真正定义明确的方法来完成您所要求的操作。

如果我说:

x = object()

x 是“原子上可腌制的”吗?如果我说:

x.foo = threading.Lock()

? x 现在是“原子可腌制的”吗?

如果我创建一个始终具有锁定属性的单独类怎么办?如果我从实例中删除该属性会怎样?

Given the dynamic nature of Python, I don't think there's really a well-defined way to do what you're asking aside from heuristics or a whitelist.

If I say:

x = object()

is x "atomically pickleable"? What if I say:

x.foo = threading.Lock()

? is x "atomically pickleable" now?

What if I made a separate class that always had a lock attribute? What if I deleted that attribute from an instance?

私藏温柔 2024-10-09 19:22:15

我认为 persist_id 接口与您尝试做的事情不太匹配。它设计用于当您的对象应该引用新程序上的等效对象而不是旧程序的副本时使用。您正在尝试过滤掉每个无法腌制的不同对象以及您为什么要尝试这样做。

我认为这是您的代码中存在问题的迹象。您想要 pickle 引用 gui 小部件、文件和锁的对象这一事实表明您正在做一些奇怪的事情。您通常保留的对象类型不应与该类型对象相关或保存对该类型对象的引用。

话虽如此,我认为你最好的选择如下:

class MyPickler(Pickler):
    def save(self, obj):
        try:
             Pickler.save(self, obj)
        except PicklingEror:
             Pickle.save( self, FilteredObject(obj) )

这应该适用于 python 实现,我不保证 C 实现中会发生什么。每个被保存的对象都将被传递给 save 方法。当无法pickle对象时,此方法将引发PicklingError。此时,您可以介入并调用该函数,要求它腌制您自己的对象,该对象应该可以正常腌制。

编辑

根据我的理解,您本质上有一个用户创建的对象字典。有些对象是可腌制的,有些则不然。我会这样做:

class saveable_dict(dict):
    def __getstate__(self):
        data = {}
        for key, value in self.items():
             try:
                  encoded = cPickle.dumps(value)
             except PicklingError:
                  encoded = cPickle.dumps( Unpickable() )
        return data

    def __setstate__(self, state):
       for key, value in state:
           self[key] = cPickle.loads(value)

然后当您想要保存该对象集合时使用该字典。用户应该能够取回任何可picklable 对象,但其他所有内容都将以Unpicklable() 对象的形式返回。此方法与之前的方法之间的区别在于对象本身是可拾取的,但引用了不可拾取的对象。但无论如何,这些物体很可能会被损坏。

这种方法还有一个好处是它完全保留在定义的 API 内,因此应该在 cPickle 或 pickle 中工作。

I think the persistent_id interface is a poor match for you are attempting to do. It is designed to be used when your object should refer to equivalent objects on the new program rather then copies of the old one. You are attempting to filter out every object that cannot be pickled which is different and why are you attempting to do this.

I think this is a sign of problem in your code. That fact that you want to pickle objects which refer to gui widgets, files, and locks suggests that you are doing something strange. The kind of objects you typically persist shouldn't be related to or hold references to that sort of object.

Having said that, I think your best option is the following:

class MyPickler(Pickler):
    def save(self, obj):
        try:
             Pickler.save(self, obj)
        except PicklingEror:
             Pickle.save( self, FilteredObject(obj) )

This should work for the python implementation, I make no guarantees as to what will happen in the C implementation. Every object which gets saved will be passed to the save method. This method will raise the PicklingError when it cannot pickle the object. At this point, you can step in and recall the function asking it to pickle your own object which should pickle just fine.

EDIT

From my understanding, you have essentially a user-created dictionary of objects. Some objects are picklable and some aren't. I'd do this:

class saveable_dict(dict):
    def __getstate__(self):
        data = {}
        for key, value in self.items():
             try:
                  encoded = cPickle.dumps(value)
             except PicklingError:
                  encoded = cPickle.dumps( Unpickable() )
        return data

    def __setstate__(self, state):
       for key, value in state:
           self[key] = cPickle.loads(value)

Then use that dictionary when you want to hold that collection of objects. The user should be able to get any picklable objects back, but everything else will come back as the Unpicklable() object. The difference between this and the previous approach is in objects which are themselves pickable but have references to unpicklable objects. But those objects are probably going to come back broken regardless.

This approach also has the benefit that it remains entirely within the defined API and thus should work in either cPickle or pickle.

奈何桥上唱咆哮 2024-10-09 19:22:15

我最终为此编写了自己的解决方案。

这是代码以下是测试。它是 GarlicSim 的一部分,因此您可以通过 安装 garlicsim 并执行 from Garlicsim.general_misc import pickle_tools

如果您想在 Python 3 代码上使用它,请使用 Python 3 fork of garlicsim

以下是该模块的摘录(可能已过时):

import re
import cPickle as pickle_module
import pickle # Importing just to get dispatch table, not pickling with it.
import copy_reg
import types

from garlicsim.general_misc import address_tools
from garlicsim.general_misc import misc_tools


def is_atomically_pickleable(thing):
    '''
    Return whether `thing` is an atomically pickleable object.

    "Atomically-pickleable" means that it's pickleable without considering any
    other object that it contains or refers to. For example, a `list` is
    atomically pickleable, even if it contains an unpickleable object, like a
    `threading.Lock()`.

    However, the `threading.Lock()` itself is not atomically pickleable.
    '''
    my_type = misc_tools.get_actual_type(thing)
    return _is_type_atomically_pickleable(my_type, thing)


def _is_type_atomically_pickleable(type_, thing=None):
    '''Return whether `type_` is an atomically pickleable type.'''
    try:
        return _is_type_atomically_pickleable.cache[type_]
    except KeyError:
        pass

    if thing is not None:
        assert isinstance(thing, type_)

    # Sub-function in order to do caching without crowding the main algorithm:
    def get_result():

        # We allow a flag for types to painlessly declare whether they're
        # atomically pickleable:
        if hasattr(type_, '_is_atomically_pickleable'):
            return type_._is_atomically_pickleable

        # Weird special case: `threading.Lock` objects don't have `__class__`.
        # We assume that objects that don't have `__class__` can't be pickled.
        # (With the exception of old-style classes themselves.)
        if not hasattr(thing, '__class__') and \
           (not isinstance(thing, types.ClassType)):
            return False

        if not issubclass(type_, object):
            return True

        def assert_legit_pickling_exception(exception):
            '''Assert that `exception` reports a problem in pickling.'''
            message = exception.args[0]
            segments = [
                "can't pickle",
                'should only be shared between processes through inheritance',
                'cannot be passed between processes or pickled'
            ]
            assert any((segment in message) for segment in segments)
            # todo: turn to warning

        if type_ in pickle.Pickler.dispatch:
            return True

        reduce_function = copy_reg.dispatch_table.get(type_)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        reduce_function = getattr(type_, '__reduce_ex__', None)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing, 0)
                # (The `0` is the protocol argument.)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        reduce_function = getattr(type_, '__reduce__', None)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        return False

    result = get_result()
    _is_type_atomically_pickleable.cache[type_] = result
    return result

_is_type_atomically_pickleable.cache = {}

I ended up coding my own solution to this.

Here's the code. Here are the tests. It's part of GarlicSim, so you can use it by installing garlicsim and doing from garlicsim.general_misc import pickle_tools.

If you want to use it on Python 3 code, use the Python 3 fork of garlicsim.

Here is an excerpt from the module (may be outdated):

import re
import cPickle as pickle_module
import pickle # Importing just to get dispatch table, not pickling with it.
import copy_reg
import types

from garlicsim.general_misc import address_tools
from garlicsim.general_misc import misc_tools


def is_atomically_pickleable(thing):
    '''
    Return whether `thing` is an atomically pickleable object.

    "Atomically-pickleable" means that it's pickleable without considering any
    other object that it contains or refers to. For example, a `list` is
    atomically pickleable, even if it contains an unpickleable object, like a
    `threading.Lock()`.

    However, the `threading.Lock()` itself is not atomically pickleable.
    '''
    my_type = misc_tools.get_actual_type(thing)
    return _is_type_atomically_pickleable(my_type, thing)


def _is_type_atomically_pickleable(type_, thing=None):
    '''Return whether `type_` is an atomically pickleable type.'''
    try:
        return _is_type_atomically_pickleable.cache[type_]
    except KeyError:
        pass

    if thing is not None:
        assert isinstance(thing, type_)

    # Sub-function in order to do caching without crowding the main algorithm:
    def get_result():

        # We allow a flag for types to painlessly declare whether they're
        # atomically pickleable:
        if hasattr(type_, '_is_atomically_pickleable'):
            return type_._is_atomically_pickleable

        # Weird special case: `threading.Lock` objects don't have `__class__`.
        # We assume that objects that don't have `__class__` can't be pickled.
        # (With the exception of old-style classes themselves.)
        if not hasattr(thing, '__class__') and \
           (not isinstance(thing, types.ClassType)):
            return False

        if not issubclass(type_, object):
            return True

        def assert_legit_pickling_exception(exception):
            '''Assert that `exception` reports a problem in pickling.'''
            message = exception.args[0]
            segments = [
                "can't pickle",
                'should only be shared between processes through inheritance',
                'cannot be passed between processes or pickled'
            ]
            assert any((segment in message) for segment in segments)
            # todo: turn to warning

        if type_ in pickle.Pickler.dispatch:
            return True

        reduce_function = copy_reg.dispatch_table.get(type_)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        reduce_function = getattr(type_, '__reduce_ex__', None)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing, 0)
                # (The `0` is the protocol argument.)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        reduce_function = getattr(type_, '__reduce__', None)
        if reduce_function:
            try:
                reduce_result = reduce_function(thing)
            except Exception, exception:
                assert_legit_pickling_exception(exception)
                return False
            else:
                return True

        return False

    result = get_result()
    _is_type_atomically_pickleable.cache[type_] = result
    return result

_is_type_atomically_pickleable.cache = {}
花开柳相依 2024-10-09 19:22:15

dill 有用于此类检查的 pickles 方法。

>>> import threading
>>> l = [threading.Lock()]
>>> 
>>> import dill
>>> dill.pickles(l)
True
>>> 
>>> dill.pickles(threading.Lock())
True
>>> f = open('whatever', 'w') 
>>> f.close()
>>> dill.pickles(open('whatever', 'r'))
True

好吧,dill 会自动腌制所有示例,所以让我们尝试其他方法:

>>> l = [iter([1,2,3]), xrange(5)]
>>> dill.pickles(l)
False

好吧,这失败了。现在,让我们调查一下:

>>> dill.detect.trace(True)
>>> dill.pickles(l)
T4: <type 'listiterator'>
False
>>> map(dill.pickles, l)
T4: <type 'listiterator'>
Si: xrange(5)
F2: <function _eval_repr at 0x106991cf8>
[False, True]

好的。我们可以看到 iter 失败,但是 xrange 执行 pickle。因此,让我们替换 iter

>>> l[0] = xrange(1,4)
>>> dill.pickles(l)
Si: xrange(1, 4)
F2: <function _eval_repr at 0x106991cf8>
Si: xrange(5)
True

现在我们的对象自动腌制。

dill has the pickles method for such a check.

>>> import threading
>>> l = [threading.Lock()]
>>> 
>>> import dill
>>> dill.pickles(l)
True
>>> 
>>> dill.pickles(threading.Lock())
True
>>> f = open('whatever', 'w') 
>>> f.close()
>>> dill.pickles(open('whatever', 'r'))
True

Well, dill atomically pickles all of your examples, so let's try something else:

>>> l = [iter([1,2,3]), xrange(5)]
>>> dill.pickles(l)
False

Ok, this fails. Now, let's investigate:

>>> dill.detect.trace(True)
>>> dill.pickles(l)
T4: <type 'listiterator'>
False
>>> map(dill.pickles, l)
T4: <type 'listiterator'>
Si: xrange(5)
F2: <function _eval_repr at 0x106991cf8>
[False, True]

Ok. we can see the iter fails, but the xrange does pickle. So, let's replace the iter.

>>> l[0] = xrange(1,4)
>>> dill.pickles(l)
Si: xrange(1, 4)
F2: <function _eval_repr at 0x106991cf8>
Si: xrange(5)
True

Now our object atomically pickles.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文