为什么-1和-2都在cpython中hash至-2?

发布于 2025-01-20 16:53:32 字数 511 浏览 2 评论 0原文

可能的重复:
何时计算了Python对象的哈希,为什么Hash为-1不同?

为什么-1 and -2在python时都与同一数字相同?

自从他们这样做之后,Python如何分开这两个数字?

>>> -1 is -2
False
>>> hash(-1) is hash(-2)
True
>>> hash(-1)
-2
>>> hash(-2)
-2

Possible Duplicate:
When is a python object's hash computed and why is the hash of -1 different?

Why do -1 and -2 both hash to the same number if Python?

Since they do, how does Python tell these two numbers apart?

>>> -1 is -2
False
>>> hash(-1) is hash(-2)
True
>>> hash(-1)
-2
>>> hash(-2)
-2

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

烟酒忠诚 2025-01-27 16:53:32

-1 是 CPython C 级别的保留值,它阻止哈希函数生成 -1 的哈希值。正如 DSM 所指出的,IronPython 和 PyPy 中的情况并非如此,其中 hash(-1) != hash(-2)

请参阅此 Quora 答案

如果您在 C 扩展模块中编写类型并提供 tp_hash
方法,你必须避免 -1 - 如果你返回 -1,Python 将假设
你的意思是抛出一个错误。

如果你用纯Python编写一个类并提供一个__hash__方法,
谢天谢地,没有这样的要求。但那是因为 C 代码
调用你的 __hash__ 方法可以为你做到这一点 - 如果你的
__hash__ 返回 -1,然后应用于您的对象的 hash() 将实际返回 -2

这实际上只是重新打包来自 effbot 的信息:

哈希值-1被保留(它用于标记C中的错误)
执行)。如果哈希算法生成这个值,我们只需
使用 -2 代替。

您还可以在源代码中看到这一点。例如,对于 Python 3 的 int 对象,它位于 哈希实现

if (x == (Py_uhash_t)-1)
    x = (Py_uhash_t)-2;
return (Py_hash_t)x;

既然是这样,Python 如何区分这两个数字?

由于所有哈希函数都将大的输入空间映射到较小的输入空间,因此无论哈希函数有多好,总是会发生冲突。例如,考虑哈希字符串。如果哈希码是 32 位整数,则有 2^32(略多于 40 亿)个哈希码。如果考虑长度为 6 的所有 ASCII 字符串,则输入空间中有 (2^7)^6(略低于 4.4 万亿)个不同的项目。只要有了这一套,无论你有多优秀,你都一定会遇到很多很多的碰撞。添加无限长度的 Unicode 字符和字符串!

因此,哈希码仅提示对象的位置,随后进行相等性测试来测试候选键。要在哈希表集中实现成员资格测试,哈希码会为您提供用于搜索值的“桶”号。但是,具有相同哈希码的所有集合项都在桶中。为此,您还需要进行相等测试来区分存储桶中的所有候选者。

有关可哈希对象的 CPython 文档 中暗示了这种哈希码和等式二元性。在其他语言/框架中,有一个指南/规则,如果您提供自定义哈希代码函数,则还必须提供自定义相等性测试(在与哈希代码函数相同的字段上执行)。


事实上,今天发布的 Python 版本正是解决了这个问题,通过一个安全补丁解决了当这种(相同的哈希值,但大规模)被用作拒绝服务攻击时的效率问题 - http://mail.python.org/pipermail/python-list/2012-April/1290792.html

-1 is a reserved value at the C level of CPython which prevents hash functions from being able to produce a hash value of -1. As noted by DSM, the same is not true in IronPython and PyPy where hash(-1) != hash(-2).

See this Quora answer:

If you write a type in a C extension module and provide a tp_hash
method, you have to avoid -1 — if you return -1, Python will assume
you meant to throw an error.

If you write a class in pure Python and provide a __hash__ method,
there's no such requirement, thankfully. But that's because the C code
that invokes your __hash__ method does that for you — if your
__hash__ returns -1, then hash() applied to your object will actually return -2.

Which really just repackages the information from effbot:

The hash value -1 is reserved (it’s used to flag errors in the C
implementation). If the hash algorithm generates this value, we simply
use -2 instead.

You can also see this in the source. For example for Python 3’s int object, this is at the end of the hash implementation:

if (x == (Py_uhash_t)-1)
    x = (Py_uhash_t)-2;
return (Py_hash_t)x;

Since they do, how does Python tell these two numbers apart?

Since all hash functions map a large input space to a smaller input space, collisions are always expected, no matter how good the hash function is. Think of hashing strings, for example. If hash codes are 32-bit integers, you have 2^32 (a little more than 4 billion) hash codes. If you consider all ASCII strings of length 6, you have (2^7)^6 (just under 4.4 trillion) different items in your input space. With only this set, you are guaranteed to have many, many collisions no matter how good you are. Add Unicode characters and strings of unlimited length to that!

Therefore, the hash code only hints at the location of an object, an equality test follows to test candidate keys. To implement a membership test in a hash-table set, the hash code gives you "bucket" number in which to search for the value. However, all set items with the same hash code are in the bucket. For this, you also need an equality test to distinguish between all candidates in the bucket.

This hash code and equality duality is hinted at in the CPython documentation on hashable objects. In other languages/frameworks, there is a guideline/rule that if you provide a custom hash code function, you must also provide a custom equality test (performed on the same fields as the hash code function).


Indeed, the Python release today address exactly this, with a security patch that addresses the efficiency issue when this (identical hash values, but on a massive scale) is used as a denial of service attack - http://mail.python.org/pipermail/python-list/2012-April/1290792.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文