腌制和复制持久对象的类?

发布于 2024-08-04 11:03:41 字数 117 浏览 7 评论 0原文

我正在尝试为只读对象编写一个类,该对象不会使用 copy 模块真正复制,并且当它被腌制以在进程之间传输时,每个进程将维护不超过它的一个副本,无论它作为“新”对象传递多少次。已经有类似的事情了吗?

I'm trying to write a class for a read-only object which will not be really copied with the copy module, and when it will be pickled to be transferred between processes each process will maintain no more than one copy of it, no matter how many times it will be passed around as a "new" object. Is there already something like that?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

不即不离 2024-08-11 11:03:41

我尝试着实现这一点。 @Alex Martelli 和其他人,请给我评论/改进。我认为这最终会出现在 GitHub 上。

"""
todo: need to lock library to avoid thread trouble?

todo: need to raise an exception if we're getting pickled with
an old protocol?

todo: make it polite to other classes that use __new__. Therefore, should
probably work not only when there is only one item in the *args passed to new.

"""

import uuid
import weakref

library = weakref.WeakValueDictionary()

class UuidToken(object):
    def __init__(self, uuid):
        self.uuid = uuid


class PersistentReadOnlyObject(object):
    def __new__(cls, *args, **kwargs):
        if len(args)==1 and len(kwargs)==0 and isinstance(args[0], UuidToken):
            received_uuid = args[0].uuid
        else:
            received_uuid = None

        if received_uuid:
            # This section is for when we are called at unpickling time
            thing = library.pop(received_uuid, None)
            if thing:
                thing._PersistentReadOnlyObject__skip_setstate = True
                return thing
            else: # This object does not exist in our library yet; Let's add it
                new_args = args[1:]
                thing = super(PersistentReadOnlyObject, cls).__new__(cls,
                                                                     *new_args,
                                                                     **kwargs)
                thing._PersistentReadOnlyObject__uuid = received_uuid
                library[received_uuid] = thing
                return thing

        else:
            # This section is for when we are called at normal creation time
            thing = super(PersistentReadOnlyObject, cls).__new__(cls, *args,
                                                                 **kwargs)
            new_uuid = uuid.uuid4()
            thing._PersistentReadOnlyObject__uuid = new_uuid
            library[new_uuid] = thing
            return thing

    def __getstate__(self):
        my_dict = dict(self.__dict__)
        del my_dict["_PersistentReadOnlyObject__uuid"]
        return my_dict

    def __getnewargs__(self):
        return (UuidToken(self._PersistentReadOnlyObject__uuid),)

    def __setstate__(self, state):
        if self.__dict__.pop("_PersistentReadOnlyObject__skip_setstate", None):
            return
        else:
            self.__dict__.update(state)

    def __deepcopy__(self, memo):
        return self

    def __copy__(self):
        return self

# --------------------------------------------------------------
"""
From here on it's just testing stuff; will be moved to another file.
"""


def play_around(queue, thing):
    import copy
    queue.put((thing, copy.deepcopy(thing),))

class Booboo(PersistentReadOnlyObject):
    def __init__(self):
        self.number = random.random()

if __name__ == "__main__":

    import multiprocessing
    import random
    import copy

    def same(a, b):
        return (a is b) and (a == b) and (id(a) == id(b)) and \
               (a.number == b.number)

    a = Booboo()
    b = copy.copy(a)
    c = copy.deepcopy(a)
    assert same(a, b) and same(b, c)

    my_queue = multiprocessing.Queue()
    process = multiprocessing.Process(target = play_around,
                                      args=(my_queue, a,))
    process.start()
    process.join()
    things = my_queue.get()
    for thing in things:
        assert same(thing, a) and same(thing, b) and same(thing, c)
    print("all cool!")

I made an attempt to implement this. @Alex Martelli and anyone else, please give me comments/improvements. I think this will eventually end up on GitHub.

"""
todo: need to lock library to avoid thread trouble?

todo: need to raise an exception if we're getting pickled with
an old protocol?

todo: make it polite to other classes that use __new__. Therefore, should
probably work not only when there is only one item in the *args passed to new.

"""

import uuid
import weakref

library = weakref.WeakValueDictionary()

class UuidToken(object):
    def __init__(self, uuid):
        self.uuid = uuid


class PersistentReadOnlyObject(object):
    def __new__(cls, *args, **kwargs):
        if len(args)==1 and len(kwargs)==0 and isinstance(args[0], UuidToken):
            received_uuid = args[0].uuid
        else:
            received_uuid = None

        if received_uuid:
            # This section is for when we are called at unpickling time
            thing = library.pop(received_uuid, None)
            if thing:
                thing._PersistentReadOnlyObject__skip_setstate = True
                return thing
            else: # This object does not exist in our library yet; Let's add it
                new_args = args[1:]
                thing = super(PersistentReadOnlyObject, cls).__new__(cls,
                                                                     *new_args,
                                                                     **kwargs)
                thing._PersistentReadOnlyObject__uuid = received_uuid
                library[received_uuid] = thing
                return thing

        else:
            # This section is for when we are called at normal creation time
            thing = super(PersistentReadOnlyObject, cls).__new__(cls, *args,
                                                                 **kwargs)
            new_uuid = uuid.uuid4()
            thing._PersistentReadOnlyObject__uuid = new_uuid
            library[new_uuid] = thing
            return thing

    def __getstate__(self):
        my_dict = dict(self.__dict__)
        del my_dict["_PersistentReadOnlyObject__uuid"]
        return my_dict

    def __getnewargs__(self):
        return (UuidToken(self._PersistentReadOnlyObject__uuid),)

    def __setstate__(self, state):
        if self.__dict__.pop("_PersistentReadOnlyObject__skip_setstate", None):
            return
        else:
            self.__dict__.update(state)

    def __deepcopy__(self, memo):
        return self

    def __copy__(self):
        return self

# --------------------------------------------------------------
"""
From here on it's just testing stuff; will be moved to another file.
"""


def play_around(queue, thing):
    import copy
    queue.put((thing, copy.deepcopy(thing),))

class Booboo(PersistentReadOnlyObject):
    def __init__(self):
        self.number = random.random()

if __name__ == "__main__":

    import multiprocessing
    import random
    import copy

    def same(a, b):
        return (a is b) and (a == b) and (id(a) == id(b)) and \
               (a.number == b.number)

    a = Booboo()
    b = copy.copy(a)
    c = copy.deepcopy(a)
    assert same(a, b) and same(b, c)

    my_queue = multiprocessing.Queue()
    process = multiprocessing.Process(target = play_around,
                                      args=(my_queue, a,))
    process.start()
    process.join()
    things = my_queue.get()
    for thing in things:
        assert same(thing, a) and same(thing, b) and same(thing, c)
    print("all cool!")
岁吢 2024-08-11 11:03:41

我不知道已经实现了任何此类功能。有趣的问题如下,需要精确的规范来说明在这种情况下会发生什么...:

  • 进程 A 制作 obj 并将其发送到 B,B 对其进行 unpickle,到目前为止,
  • A 已将 X 更改为 obj,同时B 对 obj 的 ITS 副本进行更改,
  • 现在任一进程将其 obj 发送到另一个进程,这会取消它:发生了什么变化
    此时该对象需要在每个进程中可见
    ?这有关系吗
    A 是否发送给 B,反之亦然,即 A 是否“拥有”该对象?或者什么?

如果你不在乎,比如说因为只有 A 拥有 obj——只有 A 才被允许进行更改并将 obj 发送给其他人,其他人不能也不会更改——那么问题就归结为识别 obj唯一的——一个 GUID 就可以了。该类可以维护一个将 GUID 映射到现有实例的类属性字典(可能作为弱值字典以避免使实例不必要地保持活动状态,但这是一个附带问题)并确保在适当的时候返回现有实例。

但是,如果需要将更改同步到任何更细的粒度,那么突然之间,这就是分布式计算的一个非常困难的问题,并且在什么情况下发生的情况的规范确实需要非常小心地确定(并且比大多数情况下更加偏执)对于我们来说——分布式编程是非常棘手的,除非狂热地遵循一些简单且可证明正确的模式和习惯用法!-)。

如果您能为我们确定规格,我可以提供一个草图,说明我将如何努力满足这些规格。但我不会代表您猜测规格;-)。

编辑:OP已经澄清了,看来他所需要的只是更好地理解如何控制__new__。这很简单:请参阅 __getnewargs__ - - 您需要一个新式类并使用协议 2 或更好的进行酸洗(但出于其他原因,这些都是明智的!-),然后现有对象中的 __getnewargs__ 可以简单地返回对象的 GUID(其中 __new__ 必须作为可选参数接收)。因此 __new__ 可以检查 GUID 是否存在于类的 memo [[weakvalue;-)]]dict 中(如果存在,则返回相应的对象值)——如果不存在(或者如果 GUID 未传递,则意味着它不是 unpickling,因此必须生成新的 GUID),然后创建一个真正的新对象(设置其 GUID;-)并将其记录在类级 中备忘录

顺便说一句,要制作 GUID,请考虑使用标准库中的 uuid 模块。

I don't know of any such functionality already implemented. The interesting problem is as follows, and needs precise specs as to what's to happen in this case...:

  • process A makes the obj and sends it to B which unpickles it, so far so good
  • A makes change X to the obj, meanwhile B makes change Y to ITS copy of the obj
  • now either process sends its obj to the other, which unpickles it: what changes
    to the object need to be visible at this time in each process
    ? does it matter
    whether A's sending to B or vice versa, i.e. does A "own" the object? or what?

If you don't care, say because only A OWNS the obj -- only A is ever allowed to make changes and send the obj to others, others can't and won't change -- then the problems boil down to identifying obj uniquely -- a GUID will do. The class can maintain a class attribute dict mapping GUIDs to existing instances (probably as a weak-value dict to avoid keeping instances needlessly alive, but that's a side issue) and ensure the existing instance is returned when appropriate.

But if changes need to be synchronized to any finer granularity, then suddenly it's a REALLY difficult problem of distributed computing and the specs of what happens in what cases really need to be nailed down with the utmost care (and more paranoia than is present in most of us -- distributed programming is VERY tricky unless a few simple and provably correct patterns and idioms are followed fanatically!-).

If you can nail down the specs for us, I can offer a sketch of how I would go about trying to meet them. But I won't presume to guess the specs on your behalf;-).

Edit: the OP has clarified, and it seems all he needs is a better understanding of how to control __new__. That's easy: see __getnewargs__ -- you'll need a new-style class and pickling with protocol 2 or better (but those are advisable anyway for other reasons!-), then __getnewargs__ in an existing object can simply return the object's GUID (which __new__ must receive as an optional parameter). So __new__ can check if the GUID is present in the class's memo [[weakvalue;-)]]dict (and if so return the corresponding object value) -- if not (or if the GUID is not passed, implying it's not an unpickling, so a fresh GUID must be generated), then make a truly-new object (setting its GUID;-) and also record it in the class-level memo.

BTW, to make GUIDs, consider using the uuid module in the standard library.

一向肩并 2024-08-11 11:03:41

您可以简单地使用一个字典,其键和值在接收器中相同。为了避免内存泄漏,请使用 Wea​​kKeyDictionary。

you could use simply a dictionnary with the key and the values the same in the receiver. And to avoid a memory leak use a WeakKeyDictionary.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文