python 对象的哈希值何时计算?为什么 -1 的哈希值不同?

发布于 2024-12-08 05:03:41 字数 406 浏览 0 评论 0原文

这个问题之后,我有兴趣知道 python 对象的哈希何时计算

  1. 在实例的 __init__ 时间,
  2. 第一次调用 __hash__() 时,
  3. 每次调用 __hash__() 时,或者
  4. 我可能的任何其他机会失踪了吗?

这可能会根据对象的类型而变化吗?

为什么 hash(-1) == -2 而其他整数等于它们的哈希值?

Following on from this question, I'm interested to know when is a python object's hash computed?

  1. At an instance's __init__ time,
  2. The first time __hash__() is called,
  3. Every time __hash__() is called, or
  4. Any other opportunity I might be missing?

May this vary depending on the type of the object?

Why does hash(-1) == -2 whilst other integers are equal to their hash?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一枫情书 2024-12-15 05:03:41

哈希值通常在每次使用时计算,因为您可以很容易地自行检查(见下文)。
当然,任何特定对象都可以自由缓存其哈希值。例如,CPython 字符串会执行此操作,但元组不会执行此操作(请参阅此拒绝的错误报告了解原因) 。

在 CPython 中,哈希值 -1 表示错误。这是因为C没有异常,所以需要使用返回值。当Python对象的__hash__返回-1时,CPython实际上会默默地将其更改为-2。

亲自看看:

class HashTest(object):
    def __hash__(self):
        print('Yes! __hash__ was called!')
        return -1

hash_test = HashTest()

# All of these will print out 'Yes! __hash__ was called!':

print('__hash__ call #1')
hash_test.__hash__()

print('__hash__ call #2')
hash_test.__hash__()

print('hash call #1')
hash(hash_test)

print('hash call #2')
hash(hash_test)

print('Dict creation')
dct = {hash_test: 0}

print('Dict get')
dct[hash_test]

print('Dict set')
dct[hash_test] = 0

print('__hash__ return value:')
print(hash_test.__hash__())  # prints -1
print('Actual hash value:')
print(hash(hash_test))  # prints -2

The hash is generally computed each time it's used, as you can quite easily check yourself (see below).
Of course, any particular object is free to cache its hash. For example, CPython strings do this, but tuples don't (see e.g. this rejected bug report for reasons).

The hash value -1 signals an error in CPython. This is because C doesn't have exceptions, so it needs to use the return value. When a Python object's __hash__ returns -1, CPython will actually silently change it to -2.

See for yourself:

class HashTest(object):
    def __hash__(self):
        print('Yes! __hash__ was called!')
        return -1

hash_test = HashTest()

# All of these will print out 'Yes! __hash__ was called!':

print('__hash__ call #1')
hash_test.__hash__()

print('__hash__ call #2')
hash_test.__hash__()

print('hash call #1')
hash(hash_test)

print('hash call #2')
hash(hash_test)

print('Dict creation')
dct = {hash_test: 0}

print('Dict get')
dct[hash_test]

print('Dict set')
dct[hash_test] = 0

print('__hash__ return value:')
print(hash_test.__hash__())  # prints -1
print('Actual hash value:')
print(hash(hash_test))  # prints -2
鹤仙姿 2024-12-15 05:03:41

来自此处

哈希值 -1 被保留(它用于标记 C 实现中的错误)。
如果哈希算法生成该值,我们只需使用 -2 即可。

由于整数的哈希值本身就是整数,因此它会立即更改。

From here:

The hash value -1 is reserved (it’s used to flag errors in the C implementation).
If the hash algorithm generates this value, we simply use -2 instead.

As integer's hash is integer itself it's just changed right away.

红衣飘飘貌似仙 2024-12-15 05:03:41

很容易看出选项#3 适用于用户定义的对象。如果您改变对象,这允许哈希值发生变化,但如果您曾经使用该对象作为字典键,则必须确保防止哈希值发生变化。

>>> class C:
    def __hash__(self):
        print("__hash__ called")
        return id(self)


>>> inst = C()
>>> hash(inst)
__hash__ called
43795408
>>> hash(inst)
__hash__ called
43795408
>>> d = { inst: 42 }
__hash__ called
>>> d[inst]
__hash__ called

字符串使用选项#2:它们计算一次哈希值并缓存结果。这是安全的,因为字符串是不可变的,因此散列永远不会改变,但如果您子类化 str ,结果可能不是不可变的,因此每次都会再次调用 __hash__ 方法。元组通常被认为是不可变的,因此您可能认为可以缓存哈希,但实际上元组的哈希取决于其内容的哈希,并且可能包含可变值。

对于不相信 str 的子类可以修改哈希值的@max:

>>> class C(str):
    def __init__(self, s):
        self._n = 1
    def __hash__(self):
        return str.__hash__(self) + self._n


>>> x = C('hello')
>>> hash(x)
-717693723
>>> x._n = 2
>>> hash(x)
-717693722

It is easy to see that option #3 holds for user defined objects. This allows the hash to vary if you mutate the object, but if you ever use the object as a dictionary key you must be sure to prevent the hash ever changing.

>>> class C:
    def __hash__(self):
        print("__hash__ called")
        return id(self)


>>> inst = C()
>>> hash(inst)
__hash__ called
43795408
>>> hash(inst)
__hash__ called
43795408
>>> d = { inst: 42 }
__hash__ called
>>> d[inst]
__hash__ called

Strings use option #2: they calculate the hash value once and cache the result. This is safe because strings are immutable so the hash can never change, but if you subclass str the result might not be immutable so the __hash__ method will be called every time again. Tuples are usually thought of as immutable so you might think the hash could be cached, but in fact a tuple's hash depends on the hash of its content and that might include mutable values.

For @max who doesn't believe that subclasses of str can modify the hash:

>>> class C(str):
    def __init__(self, s):
        self._n = 1
    def __hash__(self):
        return str.__hash__(self) + self._n


>>> x = C('hello')
>>> hash(x)
-717693723
>>> x._n = 2
>>> hash(x)
-717693722
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文