稳定的Python序列化(例如没有pickle模块重定位问题)

发布于 2024-08-02 18:56:44 字数 396 浏览 6 评论 0原文

我正在考虑使用 Quantities 来定义数字及其单位。该值很可能必须存储在磁盘上。您可能知道,pickling 有一个主要问题:如果您重新定位模块,unpickling 将无法解析该类,并且您将无法 unpickle 信息。对于这种行为有一些解决方法,但它们确实是解决方法。

我对这个问题的幻想解决方案是创建一个唯一编码给定单元的字符串。从磁盘获取此编码后,将其传递给 Quantities 模块中的工厂方法,该方法将其解码为正确的单位实例。优点是,即使您重新定位模块,只要将魔术字符串标记传递给工厂方法,一切仍然可以工作。

这是一个已知的概念吗?

I am considering the use of Quantities to define a number together with its unit. This value most likely will have to be stored on the disk. As you are probably aware, pickling has one major issue: if you relocate the module around, unpickling will not be able to resolve the class, and you will not be able to unpickle the information. There are workarounds for this behavior, but they are, indeed, workarounds.

A solution I fantasized for this issue would be to create a string encoding uniquely a given unit. Once you obtain this encoding from the disk, you pass it to a factory method in the Quantities module, which decodes it to a proper unit instance. The advantage is that even if you relocate the module around, everything will still work, as long as you pass the magic string token to the factory method.

Is this a known concept?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

想念有你 2024-08-09 18:56:44

看起来像是惠勒第一原理的应用,“计算机科学中的所有问题都可以通过另一个间接级别来解决”(第二原理补充说“但这通常会产生另一个问题”;-)。本质上,您需要做的是间接识别类型——类型内的实体可以使用类似pickling的方法(您可以研究pickle.py 和 copy_reg.py 了解后者的所有细节)。

具体来说,我相信您想要做的是子类 pickle.Pickler 并重写 save_inst 方法。当前版本说:

    if self.bin:
        save(cls)
        for arg in args:
            save(arg)
        write(OBJ)
    else:
        for arg in args:
            save(arg)
        write(INST + cls.__module__ + '\n' + cls.__name__ + '\n')

您想要编写与类的模块和名称不同的东西 - 类的某种唯一标识符(由两个字符串组成),可能保存在您自己的一个或多个注册表中; save_global 方法也类似。

对于 Unpickler 的子类来说,这甚至更容易,因为 _instantiate 部分已经在它自己的方法中分解出来:您只需要重写 find_class,即:

def find_class(self, module, name):
    # Subclasses may override this
    __import__(module)
    mod = sys.modules[module]
    klass = getattr(mod, name)
    return klass

它必须接受两个字符串并返回一个类对象;您可以再次通过注册表来完成此操作。

就像涉及注册表时一样,您需要考虑如何确保注册所有感兴趣的对象(类)等。这里一种流行的策略是不进行酸洗,但确保类的所有移动、模块的重命名等被永久记录在某处;这样,只有子类化的 unpickler 就可以完成所有工作,并且它可以最方便地在重写的 find_class 中完成所有工作 - 绕过所有注册问题。我猜你认为这是一种“解决方法”,但对我来说,这似乎只是“多一层间接”概念的一种极其简单、强大和方便的实现,它避免了“多一个问题”问题;-)。

Looks like an application of Wheeler's First Principle, "all problems in computer science can be solved by another level of indirection" (the Second Principle adds "but that will usually create another problem";-). Essentially what you need to do is an indirection to identify the type -- entity-within-type will be fine with pickling-like approaches (you can study the sources of pickle.py and copy_reg.py for all the fine details of the latter).

Specifically, I believe that what you want to do is subclass pickle.Pickler and override the save_inst method. Where the current version says:

    if self.bin:
        save(cls)
        for arg in args:
            save(arg)
        write(OBJ)
    else:
        for arg in args:
            save(arg)
        write(INST + cls.__module__ + '\n' + cls.__name__ + '\n')

you want to write something different than just the class's module and name -- some kind of unique identifier (made up of two string) for the class, probably held in your own registry or registries; and similarly for the save_global method.

It's even easier for your subclass of Unpickler, because the _instantiate part is already factored out in its own method: you only need to override find_class, which is:

def find_class(self, module, name):
    # Subclasses may override this
    __import__(module)
    mod = sys.modules[module]
    klass = getattr(mod, name)
    return klass

it must take two strings and return a class object; you can do that through your registries, again.

Like always when registries are involved, you need to think about how to ensure you register all objects (classes) of interest, etc, etc. One popular strategy here is to leave pickling alone, but ensure that all moves of classes, renames of modules, etc, are recorded somewhere permanent; this way, just the subclassed unpickler can do all the work, and it can most conveniently do it all in the overridden find_class -- bypassing all issues of registration. I gather you consider this a "workaround" but to me it seems just an extremely simple, powerful and convenient implementation of the "one more level of indirection" concept, which avoids the "one more problem" issue;-).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文