当前位置：文江博客话题详情

Python serialization introspection

获取Python中类的类路径或名称空间，即使它是嵌套的

发布于 2024-10-08 09:42:53 字数 1098 浏览 0 评论 0 原文

我目前正在用 Python 编写一个序列化模块，可以序列化用户定义的类。为了做到这一点，我需要获取对象的完整名称空间并将其写入文件。然后我可以使用该字符串重新创建该对象。

例如，假设我们

class B:
    class C:
        pass

现在在名为 A.py 的文件中有以下类结构，并假设 my_klass_string 是字符串 "A::B:: C"

klasses = my_klass_string.split("::")
if globals().has_key(klasses[0]):   
    klass = globals()[klasses[0]]
else:
    raise TypeError, "No class defined: %s} " % klasses[0]
if len(klasses) > 1:
    for klass_string in klasses:
        if klass.__dict__.has_key(klass_string):
            klass = klass.__dict__[klass_string]
        else:
            raise TypeError, "No class defined: %s} " % klass_string            
klass_obj = klass.__new__(klass)

我可以创建类 C 的实例，即使它位于模块 A 中的类 B 下。上面的代码相当于调用eval(klass_obj = ABC__new__(ABC))

注意：我在这里使用 __new__() 是因为我正在重构一个序列化对象，并且我不想初始化该对象，因为我不知道该类的 __init__ 参数是什么> 方法采取。我想在不调用 init 的情况下创建对象，然后稍后为其分配属性。

我可以通过任何方式从字符串创建 ABC 类的对象。我该如何走另一条路？即使该类是嵌套的，如何从该类的实例获取描述该类的完整路径的字符串？

原文

I'm currently writing a serialization module in Python that can serialize user defined classes. in order to do this I need to get the full name space of the object and write it to a file. I can then use that string to recreate the object.

for example assume that we have the following class structure in a file named A.py

class B:
    class C:
        pass

now with the assumption that my_klass_string is the string "A::B::C"

klasses = my_klass_string.split("::")
if globals().has_key(klasses[0]):   
    klass = globals()[klasses[0]]
else:
    raise TypeError, "No class defined: %s} " % klasses[0]
if len(klasses) > 1:
    for klass_string in klasses:
        if klass.__dict__.has_key(klass_string):
            klass = klass.__dict__[klass_string]
        else:
            raise TypeError, "No class defined: %s} " % klass_string            
klass_obj = klass.__new__(klass)

I can create an instance of the class C even though it lies under class B in the module A.
the above code is equivalent to calling eval(klass_obj = A.B.C.__new__(A.B.C))

note:
I'm using __new__() here because I'm reconstituting a serialized object and I don't want to init the object as I don't know what parameters the class's __init__ methods takes. I want to create the object with out calling init and then assign attributes to it later.

any way I can create an object of class A.B.C from a string. bout how do I go the other way? how to I get a string that describes the full path to the class from an instance of that class even if the class is nested?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

坏尐絯℡ 2024-10-15 09:42:53

您无法获得“给定实例的类的完整路径”
class”，因为Python中没有这样的东西。对于
例如，以您的示例为基础：

>>> class B(object):
...     class C(object):
...             pass
... 
>>> D = B.C
>>> x = D()
>>> isinstance(x, B.C)
True

x 的“类路径”应该是什么？ D 还是 BC？两者都是
同样有效，因此 Python 没有给你任何告诉你的方法
来自另一个。

事实上，即使是 Python 的 pickle 模块在腌制对象 x 时也会遇到麻烦：

>>> import pickle
>>> t = open('/tmp/x.pickle', 'w+b')
>>> pickle.dump(x, t)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/pickle.py", line 1362, in dump
    Pickler(file, protocol).dump(obj)
  ...
  File "/usr/lib/python2.6/pickle.py", line 748, in save_global
   (obj, module, name))
  pickle.PicklingError: Can't pickle <class '__main__.C'>: it's not found as __main__.C

因此，一般来说，除了添加属性之外，我没有其他选择
到你的所有类（例如，_class_path），你的序列化代码会查找它
将类名记录为序列化格式：

class A(object):
  _class_path = 'mymodule.A'
  class B(object):
    _class_path = 'mymodule.A.B'
    ...

您甚至可以使用中的其他评论如果您执行上面的 D=BC 操作，则可能适用相同的 SO 帖子）。

也就是说，如果您可以将序列化代码限制为 (1) 个实例
的新式类，并且（2）这些类是在
模块的顶层，那么你可以复制 pickle 所做的事情
（来自 Python 的 pickle.py 中第 730--768 行的函数 save_global
2.6）。

这个想法是每个新式类都定义属性 __name__
和 __module__，它们是扩展为类名的字符串（如
在源中找到）和模块名称（如在
sys.modules);通过保存这些，您可以稍后导入模块并
获取该类的一个实例：

__import__(module_name)
class_obj = getattr(sys.modules[module_name], class_name)

You cannot get the "full path to the class given an instance of the
class", for the reason that there is no such thing in Python. For
instance, building on your example:

>>> class B(object):
...     class C(object):
...             pass
... 
>>> D = B.C
>>> x = D()
>>> isinstance(x, B.C)
True

What should the "class path" of x be? D or B.C? Both are
equally valid, and thus Python does not give you any means of telling one
from the other.

Indeed, even Python's pickle module has troubles pickling the object x:

>>> import pickle
>>> t = open('/tmp/x.pickle', 'w+b')
>>> pickle.dump(x, t)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.6/pickle.py", line 1362, in dump
    Pickler(file, protocol).dump(obj)
  ...
  File "/usr/lib/python2.6/pickle.py", line 748, in save_global
   (obj, module, name))
  pickle.PicklingError: Can't pickle <class '__main__.C'>: it's not found as __main__.C

So, in general, I see no other option than adding an attribute
to all your classes (say, _class_path), and your serialization code would look it up for
recording the class name into the serialized format:

class A(object):
  _class_path = 'mymodule.A'
  class B(object):
    _class_path = 'mymodule.A.B'
    ...

You can even do this automatically with some metaclass magic (but also read the other comments in the same SO post for caveats that may apply if you do the D=B.C above).

That said, if you can limit your serialization code to (1) instances
of new-style classes, and (2) these classes are defined at the
top-level of a module, then you can just copy what pickle does
(function save_global at lines 730--768 in pickle.py from Python
2.6).

The idea is that every new-style class defines attributes __name__
and __module__, which are strings that expand to the class name (as
found in the sources) and the module name (as found in
sys.modules); by saving these you can later import the module and
get an instance of the class:

__import__(module_name)
class_obj = getattr(sys.modules[module_name], class_name)

回复收藏 0 原文

泪意 2024-10-15 09:42:53

你不能，以任何合理的、非疯狂的方式。我想您可以找到类名和模块，然后对于每个类名验证它是否存在于模块中，如果不存在，则以分层方式遍历模块中确实存在的所有类，直到找到它。

但由于没有理由有这样的类层次结构，所以这不是问题。 :-)

另外，我知道您现在不想在工作中听到这个，但是：

跨平台序列化是一个有趣的主题，但是使用这样的对象进行操作不太可能非常有用，因为目标系统必须安装完全相同的对象层次结构。因此，您必须有两个用两种不同语言编写的系统，并且它们完全相同。这几乎是不可能的，而且可能不值得这么麻烦。

例如，您将无法使用 Python 标准库中的任何对象，因为 Ruby 中不存在这些对象。最终结果是您必须创建自己的对象层次结构，最终仅使用字符串和数字等基本类型。在这种情况下，您的对象刚刚成为基本原语的包含，然后您也可以使用 JSON 或 XML 序列化所有内容。

回复收藏 0 原文

自由范儿 2024-10-15 09:42:53

我目前正在用 Python 编写一个序列化模块，可以序列化用户定义的类。

不要。标准库已经包含一个。实际上，根据您的计算方式，它至少包括两个（pickle 和 shelve）。

回复收藏 0 原文

童话里做英雄 2024-10-15 09:42:53

有两种方法可以做到这一点。

解决方案 1

第一个解决方案通过垃圾收集器。

B -> __dict__ -> C

这是代码：

>>> class B(object):
    class C(object):
        pass

>>> gc.get_referrers(B.C) # last element in the list
[<attribute '__dict__' of 'C' objects>, <attribute '__weakref__' of 'C' objects>, (<class '__main__.C'>, <type 'object'>), {'__dict__': <attribute '__dict__' of 'B' objects>, '__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'B' objects>, 'C': <class '__main__.C'>, '__doc__': None}] 

>>> gc.get_referrers(gc.get_referrers(B.C)[-1]) # first element in this list
[<class '__main__.B'>, [<attribute '__dict__' of 'C' objects>, <attribute '__weakref__' of 'C' objects>, (<class '__main__.C'>, <type 'object'>), {'__dict__': <attribute '__dict__' of 'B' objects>, '__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'B' objects>, 'C': <class '__main__.C'>, '__doc__': None}]]

>>> gc.get_referrers(gc.get_referrers(B.C)[-1])[0]
<class '__main__.B'>

算法：

搜索与 C 具有相同 __module__ 的类字典，
使用 'C' 属性
获取该类，如果该类是嵌套的，则。执行 1. 递归

解决方案 2

使用源文件。使用检查来获取类的行并向上扫描嵌套它的新类。

注意：我不知道 python 2 中没有干净的方法，但 python 3 提供了一种。

There are two ways of doing this.

Solution 1

The first one goes via the garbage-collector.

B -> __dict__ -> C

this is the code:

>>> class B(object):
    class C(object):
        pass

>>> gc.get_referrers(B.C) # last element in the list
[<attribute '__dict__' of 'C' objects>, <attribute '__weakref__' of 'C' objects>, (<class '__main__.C'>, <type 'object'>), {'__dict__': <attribute '__dict__' of 'B' objects>, '__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'B' objects>, 'C': <class '__main__.C'>, '__doc__': None}] 

>>> gc.get_referrers(gc.get_referrers(B.C)[-1]) # first element in this list
[<class '__main__.B'>, [<attribute '__dict__' of 'C' objects>, <attribute '__weakref__' of 'C' objects>, (<class '__main__.C'>, <type 'object'>), {'__dict__': <attribute '__dict__' of 'B' objects>, '__module__': '__main__', '__weakref__': <attribute '__weakref__' of 'B' objects>, 'C': <class '__main__.C'>, '__doc__': None}]]

>>> gc.get_referrers(gc.get_referrers(B.C)[-1])[0]
<class '__main__.B'>

Algorithm: