序列化自定义对象及其定义的缺点是什么？

发布于 2025-02-13 21:42:46 字数 1943 浏览 2 评论 0原文

我的问题是，当我“强制” python class/函数定义与对象一起序列化时，可以在__ main __中“重新启动”它们，而在序列化之前，则可以将未来的影响与对象一起序列化。

详细信息

这是一个常见的陷阱，如果定义不在__ MAIM __中，诸如Pickle和Dill之类的Python库（如Pickle和Dill）不会与对象一起序列化类别或功能定义。

结果，当对象进行测试时，必须在与序列化期间相同的位置找到其依赖关系。这增加了部署的开销/不灵活性，因为必须将定义维护在单独的软件包中，该软件包必须在（生产）环境中进行版本版本并存在。

我有时在序列化之前使用“ mainify”对象的解决方法，例如Oege Dijk 在这里。从本质上讲，它在__ Main __中重新分配了对象的定义，以便将其序列化。我使用的代码在下面列出。

到目前为止，这种方法在我的所有（机器学习）工作流程中效果很好，已经有一段时间了。但是，这似乎很骇人听闻，我想知道这是否会导致问题以及哪个。当然，删除了轻松修改序列化定义的能力（例如BugFix）。但这是我可以忍受的。我还没有意识到其他危险吗？

import inspect
import types

def mainify(obj):
   
    if obj.__module__ != '__main__':                                                
        
        import __main__       
        is_func = True if isinstance(obj, types.FunctionType) else False                                                            
                                
        # Get source code and compile
        source = inspect.getsource(obj if is_func else obj.__class__)
        compiled = compile(source, '<string>', 'exec')                    

        # "Declare" in __main__ and keep track which key
        # of __main__ dict is new 
        pre = list(__main__.__dict__.keys()) 
        exec(compiled, __main__.__dict__)
        post = list(__main__.__dict__.keys())                        
        new_in_main = list(set(post) - set(pre))[0]
        
        # for function return mainified version, else assign new
        # class to obj and return object
        if is_func:
            obj = __main__.__dict__[new_in_main]            
        else:            
            obj.__class__ = __main__.__dict__[new_in_main]
                
    return obj

原文

My question is what future repercussions are conceivable when I "force" Python class/function definitions to be serialized along with the objects, by "re-declaring" them in __main__ just before serialization.

Details

It is a common gotcha that Python libraries such as pickle and dill do not serialize class or function definitions along with the objects, if the definitions are not located in __main__.

As a result, when deserializing an object, its dependencies must be found in the same location as during serialization. This adds some overhead/inflexibility to deployment, as the definitions must be maintained in a separate package which must be versioned and present in the (production) environment.

I sometimes use the workaround of "mainifying" objects before serializing them, as described for instance by Oege Dijk here. It essentially redeclares the object's definition in __main__ so that it will be serialized. The code I use is listed below.

So far this approach has worked well for all my (machine learning) workflows, for quite a while. Yet, it seems quite hacky, and I wonder whether it might cause problems down the line, and which. Of course, the ability to easily modify the serialized definitions is removed (e.g. bugfix). But that is something I can live with. Are there other dangers I am unaware of?

import inspect
import types

def mainify(obj):
   
    if obj.__module__ != '__main__':                                                
        
        import __main__       
        is_func = True if isinstance(obj, types.FunctionType) else False                                                            
                                
        # Get source code and compile
        source = inspect.getsource(obj if is_func else obj.__class__)
        compiled = compile(source, '<string>', 'exec')                    

        # "Declare" in __main__ and keep track which key
        # of __main__ dict is new 
        pre = list(__main__.__dict__.keys()) 
        exec(compiled, __main__.__dict__)
        post = list(__main__.__dict__.keys())                        
        new_in_main = list(set(post) - set(pre))[0]
        
        # for function return mainified version, else assign new
        # class to obj and return object
        if is_func:
            obj = __main__.__dict__[new_in_main]            
        else:            
            obj.__class__ = __main__.__dict__[new_in_main]
                
    return obj

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

Hello爱情风 2025-02-20 21:42:46

如果您是从单个模块腌制对象，则可以通过将其分配给变量dill._main._main_module来trick dill，可以通过将其分配给dill。在思考中，此模块是__ Main __模块。这是以前版本中的一种选择，但已被删除，所以我现在不会完美地工作。（也许您需要将module .__名称__设置为“ __ main __”或none也是如此。）

在下一个dill 释放，应在接下来的几天内出现，您将可以通过更新那里的同一模块的状态（如果它已安装，或将其名称空间加载为词典。