如何为namedtuple的子类提供额外的初始化？

发布于 2024-09-17 08:27:13 字数 579 浏览 9 评论 0原文

假设我有一个像这样的 namedtuple：

EdgeBase = namedtuple("EdgeBase", "left, right")

我想为此实现一个自定义哈希函数，因此我创建以下子类：

class Edge(EdgeBase):
    def __hash__(self):
        return hash(self.left) * hash(self.right)

由于该对象是不可变的，我希望仅计算哈希值一次，所以我这样做：

class Edge(EdgeBase):
    def __init__(self, left, right):
        self._hash = hash(self.left) * hash(self.right)

    def __hash__(self):
        return self._hash

这似乎有效，但我真的不确定Python中的子类化和初始化，尤其是元组。这个解决方案有什么陷阱吗？有推荐的方法吗？还好吗？提前致谢。

原文

Suppose I have a namedtuple like this:

EdgeBase = namedtuple("EdgeBase", "left, right")

I want to implement a custom hash-function for this, so I create the following subclass:

class Edge(EdgeBase):
    def __hash__(self):
        return hash(self.left) * hash(self.right)

Since the object is immutable, I want the hash-value to be calculated only once, so I do this:

class Edge(EdgeBase):
    def __init__(self, left, right):
        self._hash = hash(self.left) * hash(self.right)

    def __hash__(self):
        return self._hash

This appears to be working, but I am really not sure about subclassing and initialization in Python, especially with tuples. Are there any pitfalls to this solution? Is there a recommended way how to do this? Is it fine? Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心的位置 2024-09-24 08:27:15

在 Python 3.7+ 中，您现在可以使用数据类轻松构建可哈希类。

Code

假设 left 和 right 的 int 类型，我们通过 unsafe_hash 使用默认哈希>⁺ 关键字：

import dataclasses as dc


@dc.dataclass(unsafe_hash=True)
class Edge:
    left: int
    right: int


hash(Edge(1, 2))
# 3713081631934410656

现在我们可以使用这些（可变的）可哈希对象作为集合中的元素或（字典中的键）。

{Edge(1, 2), Edge(1, 2), Edge(2, 1), Edge(2, 3)}
# {Edge(left=1, right=2), Edge(left=2, right=1), Edge(left=2, right=3)}

详细信息

我们也可以重写__hash__函数：

@dc.dataclass
class Edge:
    left: int
    right: int

    def __post_init__(self):
        # Add custom hashing function here
        self._hash = hash((self.left, self.right))         # emulates default

    def __hash__(self):
        return self._hash


hash(Edge(1, 2))
# 3713081631934410656

扩展@ShadowRanger的评论，OP的自定义哈希函数不可靠。特别是，属性值可以互换，例如 hash(Edge(1, 2)) == hash(Edge(2, 1))，这可能是无意的。

_{⁺注意，名称“不安全”表明尽管是可变对象，仍将使用默认哈希。这可能是不受欢迎的，特别是在需要不可变键的字典中。可以使用适当的关键字打开不可变哈希。另请参阅有关数据类中的哈希逻辑和相关问题。}

In Python 3.7+, you can now use dataclasses to build hashable classes with ease.

Code

Assuming int types of left and right, we use the default hashing via unsafe_hash⁺ keyword:

import dataclasses as dc


@dc.dataclass(unsafe_hash=True)
class Edge:
    left: int
    right: int


hash(Edge(1, 2))
# 3713081631934410656

Now we can use these (mutable) hashable objects as elements in a set or (keys in a dict).

{Edge(1, 2), Edge(1, 2), Edge(2, 1), Edge(2, 3)}
# {Edge(left=1, right=2), Edge(left=2, right=1), Edge(left=2, right=3)}

Details

We can alternatively override the __hash__ function:

@dc.dataclass
class Edge:
    left: int
    right: int

    def __post_init__(self):
        # Add custom hashing function here
        self._hash = hash((self.left, self.right))         # emulates default

    def __hash__(self):
        return self._hash


hash(Edge(1, 2))
# 3713081631934410656

Expanding on @ShadowRanger's comment, the OP's custom hash function is not reliable. In particular, the attribute values can be interchanged, e.g. hash(Edge(1, 2)) == hash(Edge(2, 1)), which is likely unintended.

_{⁺Note, the name "unsafe" suggests the default hash will be used despite being a mutable object. This may be undesired, particularly within a dict expecting immutable keys. Immutable hashing can be turned on with the appropriate keywords. See also more on hashing logic in dataclasses and a related issue.}

回复收藏 0 原文

北恋 2024-09-24 08:27:15

问题中的代码可以受益于 __init__ 中的超级调用，以防它在多重继承情况下被子类化，但在其他方面是正确的。

class Edge(EdgeBase):
    def __init__(self, left, right):
        super(Edge, self).__init__(left, right)
        self._hash = hash(self.left) * hash(self.right)

    def __hash__(self):
        return self._hash

虽然元组是只读的，但其子类的元组部分是只读的，但其他属性可以照常写入，这就是允许对 _hash 进行赋值的原因，无论它是在 __init__ 中还是在 __new__ 中完成。代码>.您可以通过将子类的 __slots__ 设置为 () 来使子类完全只读，这具有节省内存的额外好处，但随后您将无法分配给 _hash。

The code in the question could benefit from a super call in the __init__ in case it ever gets subclassed in a multiple inheritance situation, but otherwise is correct.

class Edge(EdgeBase):
    def __init__(self, left, right):
        super(Edge, self).__init__(left, right)
        self._hash = hash(self.left) * hash(self.right)

    def __hash__(self):
        return self._hash

While tuples are readonly only the tuple parts of their subclasses are readonly, other properties may be written as usual which is what allows the assignment to _hash regardless of whether it's done in __init__ or __new__. You can make the subclass fully readonly by setting it's __slots__ to (), which has the added benefit of saving memory, but then you wouldn't be able to assign to _hash.

回复收藏 0 原文

猫卆 2024-09-24 08:27:14

2017 年编辑： 结果是 namedtuple 不是一个好主意。 attrs 是现代的替代方案。

class Edge(EdgeBase):
    def __new__(cls, left, right):
        self = super(Edge, cls).__new__(cls, left, right)
        self._hash = hash(self.left) * hash(self.right)
        return self

    def __hash__(self):
        return self._hash

__new__ 是您想要在此处调用的内容，因为元组是不可变的。不可变对象在 __new__ 中创建，然后返回给用户，而不是在 __init__ 中填充数据。

cls 必须两次传递给 __new__ 上的 super 调用，因为出于历史/奇怪的原因，__new__ 是隐式的静态方法。

edit for 2017: turns out namedtuple isn't a great idea. attrs is the modern alternative.

class Edge(EdgeBase):
    def __new__(cls, left, right):
        self = super(Edge, cls).__new__(cls, left, right)
        self._hash = hash(self.left) * hash(self.right)
        return self

    def __hash__(self):
        return self._hash

__new__ is what you want to call here because tuples are immutable. Immutable objects are created in __new__ and then returned to the user, instead of being populated with data in __init__.

cls has to be passed twice to the super call on __new__ because __new__ is, for historical/odd reasons implicitly a staticmethod.

回复收藏 0 原文

~没有更多了~