序列化自定义对象及其定义的缺点是什么?
我的问题是,当我“强制” python class/函数定义与对象一起序列化时,可以在__ main __
中“重新启动”它们,而在序列化之前,则可以将未来的影响与对象一起序列化。
详细信息
这是一个常见的陷阱,如果定义不在__ MAIM __
中,诸如Pickle和Dill之类的Python库(如Pickle和Dill)不会与对象一起序列化类别或功能定义。
结果,当对象进行测试时,必须在与序列化期间相同的位置找到其依赖关系。这增加了部署的开销/不灵活性,因为必须将定义维护在单独的软件包中,该软件包必须在(生产)环境中进行版本版本并存在。
我有时在序列化之前使用“ mainify”对象的解决方法,例如Oege Dijk 在这里。从本质上讲,它在__ Main __
中重新分配了对象的定义,以便将其序列化。我使用的代码在下面列出。
到目前为止,这种方法在我的所有(机器学习)工作流程中效果很好,已经有一段时间了。但是,这似乎很骇人听闻,我想知道这是否会导致问题以及哪个。当然,删除了轻松修改序列化定义的能力(例如BugFix)。但这是我可以忍受的。我还没有意识到其他危险吗?
import inspect
import types
def mainify(obj):
if obj.__module__ != '__main__':
import __main__
is_func = True if isinstance(obj, types.FunctionType) else False
# Get source code and compile
source = inspect.getsource(obj if is_func else obj.__class__)
compiled = compile(source, '<string>', 'exec')
# "Declare" in __main__ and keep track which key
# of __main__ dict is new
pre = list(__main__.__dict__.keys())
exec(compiled, __main__.__dict__)
post = list(__main__.__dict__.keys())
new_in_main = list(set(post) - set(pre))[0]
# for function return mainified version, else assign new
# class to obj and return object
if is_func:
obj = __main__.__dict__[new_in_main]
else:
obj.__class__ = __main__.__dict__[new_in_main]
return obj
My question is what future repercussions are conceivable when I "force" Python class/function definitions to be serialized along with the objects, by "re-declaring" them in __main__
just before serialization.
Details
It is a common gotcha that Python libraries such as pickle and dill do not serialize class or function definitions along with the objects, if the definitions are not located in __main__
.
As a result, when deserializing an object, its dependencies must be found in the same location as during serialization. This adds some overhead/inflexibility to deployment, as the definitions must be maintained in a separate package which must be versioned and present in the (production) environment.
I sometimes use the workaround of "mainifying" objects before serializing them, as described for instance by Oege Dijk here. It essentially redeclares the object's definition in __main__
so that it will be serialized. The code I use is listed below.
So far this approach has worked well for all my (machine learning) workflows, for quite a while. Yet, it seems quite hacky, and I wonder whether it might cause problems down the line, and which. Of course, the ability to easily modify the serialized definitions is removed (e.g. bugfix). But that is something I can live with. Are there other dangers I am unaware of?
import inspect
import types
def mainify(obj):
if obj.__module__ != '__main__':
import __main__
is_func = True if isinstance(obj, types.FunctionType) else False
# Get source code and compile
source = inspect.getsource(obj if is_func else obj.__class__)
compiled = compile(source, '<string>', 'exec')
# "Declare" in __main__ and keep track which key
# of __main__ dict is new
pre = list(__main__.__dict__.keys())
exec(compiled, __main__.__dict__)
post = list(__main__.__dict__.keys())
new_in_main = list(set(post) - set(pre))[0]
# for function return mainified version, else assign new
# class to obj and return object
if is_func:
obj = __main__.__dict__[new_in_main]
else:
obj.__class__ = __main__.__dict__[new_in_main]
return obj
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您是从单个模块腌制对象,则可以通过将其分配给变量
dill._main._main_module
来trickdill
,可以通过将其分配给dill
。在思考中,此模块是__ Main __
模块。这是以前版本中的一种选择,但已被删除,所以我现在不会完美地工作。 (也许您需要将module .__名称__
设置为“ __ main __”
或none
也是如此。)在下一个
dill 释放,应在接下来的几天内出现,您将可以通过更新那里的同一模块的状态(如果它已安装,或将其名称空间加载为词典。
If you are pickling objects from a single module, you can monkey patch
dill
by assigning this to the variabledill._dill._main_module
to trickdill
into thinking this module is the__main__
module. That was an option in previous versions, but was removed, so I don't now if it would work flawlessly. (Maybe you'll need to setmodule.__name__
to"__main__"
orNone
too.)In the next
dill
release, which should come out in the following days, you'll be able to pickle a whole module (or a subset of its namespace) and load it in a different session, by updating the state of the same module there, if it is installed, or loading its namespace as a dictionary.