python 对象的哈希值何时计算?为什么 -1 的哈希值不同?
继这个问题之后,我有兴趣知道 python 对象的哈希何时计算?
- 在实例的
__init__
时间, - 第一次调用
__hash__()
时, - 每次调用
__hash__()
时,或者 - 我可能的任何其他机会失踪了吗?
这可能会根据对象的类型而变化吗?
为什么 hash(-1) == -2
而其他整数等于它们的哈希值?
Following on from this question, I'm interested to know when is a python object's hash computed?
- At an instance's
__init__
time, - The first time
__hash__()
is called, - Every time
__hash__()
is called, or - Any other opportunity I might be missing?
May this vary depending on the type of the object?
Why does hash(-1) == -2
whilst other integers are equal to their hash?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
哈希值通常在每次使用时计算,因为您可以很容易地自行检查(见下文)。
当然,任何特定对象都可以自由缓存其哈希值。例如,CPython 字符串会执行此操作,但元组不会执行此操作(请参阅此拒绝的错误报告了解原因) 。
在 CPython 中,哈希值 -1 表示错误。这是因为C没有异常,所以需要使用返回值。当Python对象的
__hash__
返回-1时,CPython实际上会默默地将其更改为-2。亲自看看:
The hash is generally computed each time it's used, as you can quite easily check yourself (see below).
Of course, any particular object is free to cache its hash. For example, CPython strings do this, but tuples don't (see e.g. this rejected bug report for reasons).
The hash value -1 signals an error in CPython. This is because C doesn't have exceptions, so it needs to use the return value. When a Python object's
__hash__
returns -1, CPython will actually silently change it to -2.See for yourself:
来自此处:
由于整数的哈希值本身就是整数,因此它会立即更改。
From here:
As integer's hash is integer itself it's just changed right away.
很容易看出选项#3 适用于用户定义的对象。如果您改变对象,这允许哈希值发生变化,但如果您曾经使用该对象作为字典键,则必须确保防止哈希值发生变化。
字符串使用选项#2:它们计算一次哈希值并缓存结果。这是安全的,因为字符串是不可变的,因此散列永远不会改变,但如果您子类化
str
,结果可能不是不可变的,因此每次都会再次调用__hash__
方法。元组通常被认为是不可变的,因此您可能认为可以缓存哈希,但实际上元组的哈希取决于其内容的哈希,并且可能包含可变值。对于不相信
str
的子类可以修改哈希值的@max:It is easy to see that option #3 holds for user defined objects. This allows the hash to vary if you mutate the object, but if you ever use the object as a dictionary key you must be sure to prevent the hash ever changing.
Strings use option #2: they calculate the hash value once and cache the result. This is safe because strings are immutable so the hash can never change, but if you subclass
str
the result might not be immutable so the__hash__
method will be called every time again. Tuples are usually thought of as immutable so you might think the hash could be cached, but in fact a tuple's hash depends on the hash of its content and that might include mutable values.For @max who doesn't believe that subclasses of
str
can modify the hash: