覆盖 __hash__ 后哈希值相同但对象不同

发布于 2024-10-17 04:10:03 字数 1426 浏览 0 评论 0原文

我在正确散列我的对象时遇到问题。考虑以下代码:

class Foo:
    def __init__(self, bar):
        self.keys = list(bar.keys())
        self.values = list(bar.values())    
    def __str__(self):
        return ', '.join( '%s: %s' % z for z in zip(self.keys, self.values))    
    def __hash__(self):
        return hash(str(self))

if __name__ == '__main__':
    result = set()
    d = { 1: 2, 3: 4, 5: 6, 7: 8 }
    for i in range(10):
        result.add(Foo(d))
    for r in result:
        print r, hash(r)

我希望结果集包含单个元素,因为所有添加的 Foo 对象都具有相同的内容,因此具有相同的哈希值。

然而,这就是结果:

misha@misha-K42Jr:~/Desktop/stackoverflow$ python hashproblem.py 
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338

这里有什么问题呢?哈希值 do 看起来相同,因此内置 set 对象不应该将它们视为重复项吗?为什么该集合包含重复项?

我注意到,如果我在向集合中添加元素时使用 str(Foo(d)) 而不是 Foo(d) ,事情就会按预期进行。为什么这很重要?

Python版本是:

misha@misha-K42Jr:~/Desktop/stackoverflow$ python --version
Python 2.6.6

I'm having a problem correctly hashing my objects. Consider the following code:

class Foo:
    def __init__(self, bar):
        self.keys = list(bar.keys())
        self.values = list(bar.values())    
    def __str__(self):
        return ', '.join( '%s: %s' % z for z in zip(self.keys, self.values))    
    def __hash__(self):
        return hash(str(self))

if __name__ == '__main__':
    result = set()
    d = { 1: 2, 3: 4, 5: 6, 7: 8 }
    for i in range(10):
        result.add(Foo(d))
    for r in result:
        print r, hash(r)

I expect the result set to contain a single element, since all the added Foo objects have the same contents, and therefore the same hash.

However, this is the result:

misha@misha-K42Jr:~/Desktop/stackoverflow$ python hashproblem.py 
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338
1: 2, 3: 4, 5: 6, 7: 8 2131119371379196338

What is the problem here? The hashes do look the same, so shouldn't they be treated as duplicates by the built-in set object? Why does the set contain duplicates?

I've noticed that if I use str(Foo(d)) instead of Foo(d) when adding elements to the set, things work as expected. Why does it matter?

Python version is:

misha@misha-K42Jr:~/Desktop/stackoverflow$ python --version
Python 2.6.6

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

千仐 2024-10-24 04:10:03

由于__hash__方法仅用于内部哈希表,因此您还需要重新定义__eq__

仅覆盖 __eq__ 也不正确。如果两个对象相等,即 a.__eq__(b) == True,则 hash(a)hash(b)也必须相等。

默认的 __hash__ 方法是:

def __hash__(self):
    return id(self)

Since the __hash__ method is only use for the internal hash-table, you need to redefine __eq__ as well.

Overriding only __eq__ is not correct either. If two object are equal, ie, a.__eq__(b) == True, then both hash(a) and hash(b) must be equal as well.

The default __hash__ method is:

def __hash__(self):
    return id(self)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文