在 Python 中,类名的自动完全限定是如何工作的? [与物体酸洗相关]
(可以直接跳到问题,进一步向下,并跳过介绍。)
从用户定义的类中 pickling Python 对象有一个常见的困难:
# This is program dumper.py
import pickle
class C(object):
pass
with open('obj.pickle', 'wb') as f:
pickle.dump(C(), f)
事实上,试图从另一个程序取回对象loader.py
的
# This is program loader.py
with open('obj.pickle', 'rb') as f:
obj = pickle.load(f)
结果为
AttributeError: 'module' object has no attribute 'C'
事实上,该类是按名称(“C”)腌制的,并且 loader.py
程序不知道有关 C.常见的解决方案包括导入 with
from dumper import C # Objects of class C can be imported
with open('obj.pickle', 'rb') as f:
obj = pickle.load(f)
但是,此解决方案有一些缺点,包括必须导入 pickle 对象引用的所有类(可能有很多);此外,本地命名空间会被 dumper.py 程序中的名称污染。
现在,解决方案包括在酸洗之前完全限定对象:
# New dumper.py program:
import pickle
import dumper # This is this very program!
class C(object):
pass
with open('obj.pickle', 'wb') as f:
pickle.dump(dumper.C(), f) # Fully qualified class
使用上面的原始 loader.py
程序取消酸洗现在可以直接工作(无需执行 from dumper import C
)。
问题:现在,来自 dumper.py
的其他类似乎在酸洗时自动完全合格,我很想知道这是如何工作的,以及这是否可靠,记录的行为:
import pickle
import dumper # This is this very program!
class D(object): # New class!
pass
class C(object):
def __init__(self):
self.d = D() # *NOT* fully qualified
with open('obj.pickle', 'wb') as f:
pickle.dump(dumper.C(), f) # Fully qualified pickle class
现在,使用原始 loader.py
程序进行 unpickle 也可以工作(无需执行 from dumper import C
); print obj.d 给出了一个完全限定的类,这让我感到惊讶:
<dumper.D object at 0x122e130>
这种行为非常方便,因为只有顶部的 pickle 对象才必须通过模块完全限定名称(dumper.C()
)。但这种行为可靠且有记录吗?为什么类是按名称(“D”)腌制的,但取消腌制决定腌制的 self.d
属性属于类 dumper.D
(而不是某些本地的) D
类)?
PS:提炼出的问题:我刚刚注意到一些有趣的细节,它们可能指向这个问题的答案:
在酸洗程序 dumper.py
中,print self.d
使用第一个 dumper.py
程序(没有 import dumper
的程序)在 0x2af450> 处打印 <__main__.D 对象
代码>)。另一方面,执行 import dumper
并在 dumper.py
中使用 dumper.C()
创建对象会使 print self. d
print
:self.d
属性由 Python 自动限定!因此,看来 pickle
模块在上述良好的 unpickling 行为中没有任何作用。
因此,问题实际上是:在第二种情况下,为什么 Python 将 D()
转换为完全限定的 dumper.D
?这有记录在某处吗?
(It is possible to directly jump to the question, further down, and to skip the introduction.)
There is a common difficulty with pickling Python objects from user-defined classes:
# This is program dumper.py
import pickle
class C(object):
pass
with open('obj.pickle', 'wb') as f:
pickle.dump(C(), f)
In fact, trying to get the object back from another program loader.py
with
# This is program loader.py
with open('obj.pickle', 'rb') as f:
obj = pickle.load(f)
results in
AttributeError: 'module' object has no attribute 'C'
In fact, the class is pickled by name ("C"), and the loader.py
program does not know anything about C
. A common solution consists in importing with
from dumper import C # Objects of class C can be imported
with open('obj.pickle', 'rb') as f:
obj = pickle.load(f)
However, this solution has a few drawbacks, including the fact that all the classes referenced by the pickled objects have to be imported (there can be many); furthermore, the local namespace becomes polluted by names from the dumper.py
program.
Now, a solution to this consists of fully qualifying objects prior to pickling:
# New dumper.py program:
import pickle
import dumper # This is this very program!
class C(object):
pass
with open('obj.pickle', 'wb') as f:
pickle.dump(dumper.C(), f) # Fully qualified class
Unpickling with the original loader.py
program above now works directly (no need to do from dumper import C
).
Question: Now, other classes from dumper.py
seem to be automatically fully qualified upon pickling, and I would love to know how this works, and whether this is a reliable, documented behavior:
import pickle
import dumper # This is this very program!
class D(object): # New class!
pass
class C(object):
def __init__(self):
self.d = D() # *NOT* fully qualified
with open('obj.pickle', 'wb') as f:
pickle.dump(dumper.C(), f) # Fully qualified pickle class
Now, unpickling with the original loader.py
program also works (no need to do from dumper import C
); print obj.d
gives a fully qualified class, which I find surprising:
<dumper.D object at 0x122e130>
This behavior is very convenient, since only the top, pickled object has to be fully qualified with the module name (dumper.C()
). But is this behavior reliable and documented? how come that classes are pickled by name ("D") but that the unpickling decides that the pickled self.d
attribute is of class dumper.D
(and not some local D
class)?
PS: The question, refined: I just noticed a few interesting details that might point to an answer to this question:
In the pickling program dumper.py
, print self.d
prints <__main__.D object at 0x2af450>
, with the first dumper.py
program (the one without import dumper
). On the other hand, doing import dumper
and creating the object with dumper.C()
in dumper.py
makes print self.d
print <dumper.D object at 0x2af450>
: the self.d
attribute is automatically qualified by Python! So, it appears that the pickle
module has no role in the nice unpickling behavior described above.
The question is thus really: why does Python convert D()
into the fully qualified dumper.D
, in the second case? is this documented somewhere?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当您的类在主模块中定义时,pickle 期望在未腌制它们时找到它们。在第一种情况下,类是在主模块中定义的,因此当 loader 运行时,loader 是主模块,而 pickle 找不到这些类。如果您查看 obj.pickle 的内容,您将看到名称
__main__
导出为 C 和 D 类的命名空间。在第二种情况下, dumper.py 会自行导入。现在,您实际上已经定义了两组独立的 C 和 D 类:一组在 __main__ 命名空间中,一组在
dumper
命名空间中。您可以在dumper
命名空间中序列化该文件(查看obj.pickle
进行验证)。如果找不到名称空间,pickle 将尝试动态导入名称空间,因此当 loader.py 运行时,pickle 本身会导入 dumper.py 以及 dumper.C 和 dumper.D 类。
由于您有两个单独的脚本 dumper.py 和 loader.py,因此只有在公共导入模块中定义它们共享的类才有意义:
common.py
loader.py
dumper.py
请注意,即使 dumper.py 转储
C()
在这种情况下,pickle 知道它是一个common.C
对象(参见obj.pickle
)。当loader.py运行时,它会动态导入common.py并成功加载该对象。When your classes are defined in your main module, that's where pickle expects to find them when they are unpickled. In your first case, the classes were defined in the main module, so when loader runs, loader is the main module and pickle can't find the classes. If you look at the content of
obj.pickle
, you'll see then name__main__
exported as the namespace of your C and D classes.In your second case, dumper.py imports itself. Now you actually have two separate sets of C and D classes defined: one set in
__main__
namespace and one set indumper
namespace. You serialize the one in thedumper
namespace (look inobj.pickle
to verify).pickle will attempt to dynamically import a namespace if it is not found, so when loader.py runs pickle itself imports dumper.py and the dumper.C and dumper.D classes.
Since you have two separate scripts, dumper.py and loader.py, it only makes sense to define the classes they share in a common import module:
common.py
loader.py
dumper.py
Note that even though dumper.py dumps
C()
in this case pickle knows that it is acommon.C
object (seeobj.pickle
). When loader.py runs, it will dynamically import common.py and succeed loading the object.发生的情况如下:从
dumper.py
中导入dumper
(或执行from dumper import C
)时,整个程序是再次解析(这可以通过在模块中插入打印来看到)。此行为是预期的,因为dumper
不是已加载的模块(但是__main__
被视为已加载)——它不在sys.modules.
正如 Mark 的回答所示,导入模块自然会限定模块中定义的所有名称,因此
self.d = D()
被解释为类dumper.D
重新评估文件dumper.py
时(这相当于在 Mark 的回答中解析common.py
)。因此,解释了
import dumper
(或from dumper import C
)技巧,并且 pickling 不仅完全限定了类C
,还完全限定了类>D。这使得外部程序的 unpickle 变得更容易!
这也表明,在 dumper.py 中执行的 import dumper 强制 Python 解释器解析程序两次,这既不高效也不优雅。因此,在一个程序中腌制类并在另一个程序中取消它们可能最好通过马克的回答中概述的方法来完成:腌制的类应该位于单独的模块中。
Here is what happens: when importing
dumper
(or doingfrom dumper import C
) from withindumper.py
, the whole program is parsed again (this can be seen by inserting a print in the module). This behavior is expected, becausedumper
is not a module that was already loaded (__main__
is considered loaded, however)–it is not insys.modules
.As illustrated in Mark's answer, importing a module naturally qualifies all the names defined in the module, so that
self.d = D()
is interpreted as being of classdumper.D
when re-evaluating filedumper.py
(this is equivalent to parsingcommon.py
, in Mark's answer).Thus, the
import dumper
(orfrom dumper import C
) trick is explained, and pickling fully qualifies not only classC
but also classD
. This makes unpickling by an external program easier!This also shows that
import dumper
done indumper.py
forces the Python interpreter to parse the program twice, which is neither efficient nor elegant. Pickling classes in a program and unpickling them in another one is therefore probably best done through the approach outlined in Mark's answer: pickled classes should be in a separate module.