为什么-1和-2都在cpython中hash至-2?
为什么-1
and -2
在python时都与同一数字相同?
自从他们这样做之后,Python如何分开这两个数字?
>>> -1 is -2
False
>>> hash(-1) is hash(-2)
True
>>> hash(-1)
-2
>>> hash(-2)
-2
Possible Duplicate:
When is a python object's hash computed and why is the hash of -1 different?
Why do -1
and -2
both hash to the same number if Python?
Since they do, how does Python tell these two numbers apart?
>>> -1 is -2
False
>>> hash(-1) is hash(-2)
True
>>> hash(-1)
-2
>>> hash(-2)
-2
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
-1
是 CPython C 级别的保留值,它阻止哈希函数生成-1
的哈希值。正如 DSM 所指出的,IronPython 和 PyPy 中的情况并非如此,其中hash(-1) != hash(-2)
。请参阅此 Quora 答案:
这实际上只是重新打包来自 effbot 的信息:
您还可以在源代码中看到这一点。例如,对于 Python 3 的
int
对象,它位于 哈希实现:由于所有哈希函数都将大的输入空间映射到较小的输入空间,因此无论哈希函数有多好,总是会发生冲突。例如,考虑哈希字符串。如果哈希码是 32 位整数,则有 2^32(略多于 40 亿)个哈希码。如果考虑长度为 6 的所有 ASCII 字符串,则输入空间中有 (2^7)^6(略低于 4.4 万亿)个不同的项目。只要有了这一套,无论你有多优秀,你都一定会遇到很多很多的碰撞。添加无限长度的 Unicode 字符和字符串!
因此,哈希码仅提示对象的位置,随后进行相等性测试来测试候选键。要在哈希表集中实现成员资格测试,哈希码会为您提供用于搜索值的“桶”号。但是,具有相同哈希码的所有集合项都在桶中。为此,您还需要进行相等测试来区分存储桶中的所有候选者。
有关可哈希对象的 CPython 文档 中暗示了这种哈希码和等式二元性。在其他语言/框架中,有一个指南/规则,如果您提供自定义哈希代码函数,则还必须提供自定义相等性测试(在与哈希代码函数相同的字段上执行)。
事实上,今天发布的 Python 版本正是解决了这个问题,通过一个安全补丁解决了当这种(相同的哈希值,但大规模)被用作拒绝服务攻击时的效率问题 - http://mail.python.org/pipermail/python-list/2012-April/1290792.html
-1
is a reserved value at the C level of CPython which prevents hash functions from being able to produce a hash value of-1
. As noted by DSM, the same is not true in IronPython and PyPy wherehash(-1) != hash(-2)
.See this Quora answer:
Which really just repackages the information from effbot:
You can also see this in the source. For example for Python 3’s
int
object, this is at the end of the hash implementation:Since all hash functions map a large input space to a smaller input space, collisions are always expected, no matter how good the hash function is. Think of hashing strings, for example. If hash codes are 32-bit integers, you have 2^32 (a little more than 4 billion) hash codes. If you consider all ASCII strings of length 6, you have (2^7)^6 (just under 4.4 trillion) different items in your input space. With only this set, you are guaranteed to have many, many collisions no matter how good you are. Add Unicode characters and strings of unlimited length to that!
Therefore, the hash code only hints at the location of an object, an equality test follows to test candidate keys. To implement a membership test in a hash-table set, the hash code gives you "bucket" number in which to search for the value. However, all set items with the same hash code are in the bucket. For this, you also need an equality test to distinguish between all candidates in the bucket.
This hash code and equality duality is hinted at in the CPython documentation on hashable objects. In other languages/frameworks, there is a guideline/rule that if you provide a custom hash code function, you must also provide a custom equality test (performed on the same fields as the hash code function).
Indeed, the Python release today address exactly this, with a security patch that addresses the efficiency issue when this (identical hash values, but on a massive scale) is used as a denial of service attack - http://mail.python.org/pipermail/python-list/2012-April/1290792.html