如何为namedtuple的子类提供额外的初始化?
假设我有一个像这样的 namedtuple
:
EdgeBase = namedtuple("EdgeBase", "left, right")
我想为此实现一个自定义哈希函数,因此我创建以下子类:
class Edge(EdgeBase):
def __hash__(self):
return hash(self.left) * hash(self.right)
由于该对象是不可变的,我希望仅计算哈希值一次,所以我这样做:
class Edge(EdgeBase):
def __init__(self, left, right):
self._hash = hash(self.left) * hash(self.right)
def __hash__(self):
return self._hash
这似乎有效,但我真的不确定Python中的子类化和初始化,尤其是元组。这个解决方案有什么陷阱吗?有推荐的方法吗?还好吗?提前致谢。
Suppose I have a namedtuple
like this:
EdgeBase = namedtuple("EdgeBase", "left, right")
I want to implement a custom hash-function for this, so I create the following subclass:
class Edge(EdgeBase):
def __hash__(self):
return hash(self.left) * hash(self.right)
Since the object is immutable, I want the hash-value to be calculated only once, so I do this:
class Edge(EdgeBase):
def __init__(self, left, right):
self._hash = hash(self.left) * hash(self.right)
def __hash__(self):
return self._hash
This appears to be working, but I am really not sure about subclassing and initialization in Python, especially with tuples. Are there any pitfalls to this solution? Is there a recommended way how to do this? Is it fine? Thanks in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在 Python 3.7+ 中,您现在可以使用 数据类 轻松构建可哈希类。
Code
假设
left
和right
的int
类型,我们通过unsafe_hash
使用默认哈希>+ 关键字:现在我们可以使用这些(可变的)可哈希对象作为集合中的元素或(字典中的键)。
详细信息
我们也可以重写
__hash__
函数:扩展@ShadowRanger的评论,OP的自定义哈希函数不可靠。特别是,属性值可以互换,例如 hash(Edge(1, 2)) == hash(Edge(2, 1)),这可能是无意的。
+注意,名称“不安全”表明尽管是可变对象,仍将使用默认哈希。这可能是不受欢迎的,特别是在需要不可变键的字典中。可以使用适当的关键字打开不可变哈希。另请参阅有关数据类中的哈希逻辑和相关问题。
In Python 3.7+, you can now use dataclasses to build hashable classes with ease.
Code
Assuming
int
types ofleft
andright
, we use the default hashing viaunsafe_hash
+ keyword:Now we can use these (mutable) hashable objects as elements in a set or (keys in a dict).
Details
We can alternatively override the
__hash__
function:Expanding on @ShadowRanger's comment, the OP's custom hash function is not reliable. In particular, the attribute values can be interchanged, e.g.
hash(Edge(1, 2)) == hash(Edge(2, 1))
, which is likely unintended.+Note, the name "unsafe" suggests the default hash will be used despite being a mutable object. This may be undesired, particularly within a dict expecting immutable keys. Immutable hashing can be turned on with the appropriate keywords. See also more on hashing logic in dataclasses and a related issue.
问题中的代码可以受益于 __init__ 中的超级调用,以防它在多重继承情况下被子类化,但在其他方面是正确的。
虽然元组是只读的,但其子类的元组部分是只读的,但其他属性可以照常写入,这就是允许对 _hash 进行赋值的原因,无论它是在 __init__ 中还是在 __new__ 中完成。代码>.您可以通过将子类的
__slots__
设置为 () 来使子类完全只读,这具有节省内存的额外好处,但随后您将无法分配给 _hash。The code in the question could benefit from a super call in the
__init__
in case it ever gets subclassed in a multiple inheritance situation, but otherwise is correct.While tuples are readonly only the tuple parts of their subclasses are readonly, other properties may be written as usual which is what allows the assignment to _hash regardless of whether it's done in
__init__
or__new__
. You can make the subclass fully readonly by setting it's__slots__
to (), which has the added benefit of saving memory, but then you wouldn't be able to assign to _hash.2017 年编辑: 结果是
namedtuple 不是一个好主意
。 attrs 是现代的替代方案。
__new__
是您想要在此处调用的内容,因为元组是不可变的。不可变对象在 __new__ 中创建,然后返回给用户,而不是在 __init__ 中填充数据。cls
必须两次传递给__new__
上的super
调用,因为出于历史/奇怪的原因,__new__
是隐式的静态方法。edit for 2017: turns out
namedtuple
isn't a great idea. attrs is the modern alternative.__new__
is what you want to call here because tuples are immutable. Immutable objects are created in__new__
and then returned to the user, instead of being populated with data in__init__
.cls
has to be passed twice to thesuper
call on__new__
because__new__
is, for historical/odd reasons implicitly astaticmethod
.