覆盖 __hash__ 后哈希值相同但对象不同
我在正确散列我的对象时遇到问题。考虑以下代码:
class Foo:
def __init__(self, bar):
self.keys = list(bar.keys())
self.values = list(bar.values())
def __str__(self):
return ', '.join( '%s: %s' % z for z in zip(self.keys, self.values))
def __hash__(self):
return hash(str(self))
if __name__ == '__main__':
result = set()
d = { 1: 2, 3: 4, 5: 6, 7: 8 }
for i in range(10):
result.add(Foo(d))
for r in result:
print r, hash(r)
我希望结果集包含单个元素,因为所有添加的 Foo 对象都具有相同的内容,因此具有相同的哈希值。
然而,这就是结果:
misha@misha-K42Jr:~/Desktop/stackoverflow$ python hashproblem.py
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
这里有什么问题呢?哈希值 do 看起来相同,因此内置 set
对象不应该将它们视为重复项吗?为什么该集合包含重复项?
我注意到,如果我在向集合中添加元素时使用 str(Foo(d))
而不是 Foo(d)
,事情就会按预期进行。为什么这很重要?
Python版本是:
misha@misha-K42Jr:~/Desktop/stackoverflow$ python --version
Python 2.6.6
I'm having a problem correctly hashing my objects. Consider the following code:
class Foo:
def __init__(self, bar):
self.keys = list(bar.keys())
self.values = list(bar.values())
def __str__(self):
return ', '.join( '%s: %s' % z for z in zip(self.keys, self.values))
def __hash__(self):
return hash(str(self))
if __name__ == '__main__':
result = set()
d = { 1: 2, 3: 4, 5: 6, 7: 8 }
for i in range(10):
result.add(Foo(d))
for r in result:
print r, hash(r)
I expect the result set to contain a single element, since all the added Foo
objects have the same contents, and therefore the same hash.
However, this is the result:
misha@misha-K42Jr:~/Desktop/stackoverflow$ python hashproblem.py
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
What is the problem here? The hashes do look the same, so shouldn't they be treated as duplicates by the built-in set
object? Why does the set contain duplicates?
I've noticed that if I use str(Foo(d))
instead of Foo(d)
when adding elements to the set, things work as expected. Why does it matter?
Python version is:
misha@misha-K42Jr:~/Desktop/stackoverflow$ python --version
Python 2.6.6
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
由于
__hash__
方法仅用于内部哈希表,因此您还需要重新定义__eq__
。仅覆盖
__eq__
也不正确。如果两个对象相等,即a.__eq__(b) == True
,则hash(a)
和hash(b)
也必须相等。默认的 __hash__ 方法是:
Since the
__hash__
method is only use for the internal hash-table, you need to redefine__eq__
as well.Overriding only
__eq__
is not correct either. If two object are equal, ie,a.__eq__(b) == True
, then bothhash(a)
andhash(b)
must be equal as well.The default
__hash__
method is:请参阅:http://docs.python.org/glossary.html#term-hashable - 您还需要实现
__eq__
。See: http://docs.python.org/glossary.html#term-hashable - you'll want to implement
__eq__
as well.