序列化自定义对象及其定义的缺点是什么?

发布于 2025-02-13 21:42:46 字数 1943 浏览 2 评论 0原文

我的问题是,当我“强制” python class/函数定义与对象一起序列化时,可以在__ main __中“重新启动”它们,而在序列化之前,则可以将未来的影响与对象一起序列化。

详细信息

这是一个常见的陷阱,如果定义不在__ MAIM __中,诸如Pickle和Dill之类的Python库(如Pickle和Dill)不会与对象一起序列化类别或功能定义。

结果,当对象进行测试时,必须在与序列化期间相同的位置找到其依赖关系。这增加了部署的开销/不灵活性,因为必须将定义维护在单独的软件包中,该软件包必须在(生产)环境中进行版本版本并存在。

我有时在序列化之前使用“ mainify”对象的解决方法,例如Oege Dijk 在这里。从本质上讲,它在__ Main __中重新分配了对象的定义,以便将其序列化。我使用的代码在下面列出。

到目前为止,这种方法在我的所有(机器学习)工作流程中效果很好,已经有一段时间了。但是,这似乎很骇人听闻,我想知道这是否会导致问题以及哪个。当然,删除了轻松修改序列化定义的能力(例如BugFix)。但这是我可以忍受的。我还没有意识到其他危险吗?

import inspect
import types

def mainify(obj):
   
    if obj.__module__ != '__main__':                                                
        
        import __main__       
        is_func = True if isinstance(obj, types.FunctionType) else False                                                            
                                
        # Get source code and compile
        source = inspect.getsource(obj if is_func else obj.__class__)
        compiled = compile(source, '<string>', 'exec')                    

        # "Declare" in __main__ and keep track which key
        # of __main__ dict is new 
        pre = list(__main__.__dict__.keys()) 
        exec(compiled, __main__.__dict__)
        post = list(__main__.__dict__.keys())                        
        new_in_main = list(set(post) - set(pre))[0]
        
        # for function return mainified version, else assign new
        # class to obj and return object
        if is_func:
            obj = __main__.__dict__[new_in_main]            
        else:            
            obj.__class__ = __main__.__dict__[new_in_main]
                
    return obj

My question is what future repercussions are conceivable when I "force" Python class/function definitions to be serialized along with the objects, by "re-declaring" them in __main__ just before serialization.

Details

It is a common gotcha that Python libraries such as pickle and dill do not serialize class or function definitions along with the objects, if the definitions are not located in __main__.

As a result, when deserializing an object, its dependencies must be found in the same location as during serialization. This adds some overhead/inflexibility to deployment, as the definitions must be maintained in a separate package which must be versioned and present in the (production) environment.

I sometimes use the workaround of "mainifying" objects before serializing them, as described for instance by Oege Dijk here. It essentially redeclares the object's definition in __main__ so that it will be serialized. The code I use is listed below.

So far this approach has worked well for all my (machine learning) workflows, for quite a while. Yet, it seems quite hacky, and I wonder whether it might cause problems down the line, and which. Of course, the ability to easily modify the serialized definitions is removed (e.g. bugfix). But that is something I can live with. Are there other dangers I am unaware of?

import inspect
import types

def mainify(obj):
   
    if obj.__module__ != '__main__':                                                
        
        import __main__       
        is_func = True if isinstance(obj, types.FunctionType) else False                                                            
                                
        # Get source code and compile
        source = inspect.getsource(obj if is_func else obj.__class__)
        compiled = compile(source, '<string>', 'exec')                    

        # "Declare" in __main__ and keep track which key
        # of __main__ dict is new 
        pre = list(__main__.__dict__.keys()) 
        exec(compiled, __main__.__dict__)
        post = list(__main__.__dict__.keys())                        
        new_in_main = list(set(post) - set(pre))[0]
        
        # for function return mainified version, else assign new
        # class to obj and return object
        if is_func:
            obj = __main__.__dict__[new_in_main]            
        else:            
            obj.__class__ = __main__.__dict__[new_in_main]
                
    return obj

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

Hello爱情风 2025-02-20 21:42:46

如果您是从单个模块腌制对象,则可以通过将其分配给变量dill._main._main_module来trick dill,可以通过将其分配给dill。在思考中,此模块是__ Main __模块。这是以前版本中的一种选择,但已被删除,所以我现在不会完美地工作。 (也许您需要将module .__名称__设置为“ __ main __”none也是如此。)

在下一个dill 释放,应在接下来的几天内出现,您将可以通过更新那里的同一模块的状态(如果它已安装,或将其名称空间加载为词典。

If you are pickling objects from a single module, you can monkey patch dill by assigning this to the variable dill._dill._main_module to trick dill into thinking this module is the __main__ module. That was an option in previous versions, but was removed, so I don't now if it would work flawlessly. (Maybe you'll need to set module.__name__ to "__main__" or None too.)

In the next dill release, which should come out in the following days, you'll be able to pickle a whole module (or a subset of its namespace) and load it in a different session, by updating the state of the same module there, if it is installed, or loading its namespace as a dictionary.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文