hash() 函数的最小值?

发布于 2024-09-28 13:30:26 字数 745 浏览 1 评论 0原文

在Python(3)中,hash(x)可以返回的最小值是多少?

我想使用哈希为数据库值提供快速“指纹”(基本上可以轻松查看两个较长的相似文本是否实际上相等),并且想要摆脱负数(为了简单起见),所以我我想我只需添加尽可能小的值即可获得零及以上的值。 手册非常有帮助地说明“哈希值是整数”。这大约是我之前所知道的。

今天,当我发现我在 64 位 ubuntu 上手工编译的 python 显然使用 64 位左右的哈希函数时,我有点惊讶;我一直认为应该是32位的。机器架构对 hash() 函数有影响吗?

另外,当我编译 python 时,我没有设置任何编译 64 位架构的选项(希望它能“正常工作”)。 python 会自行调整吗?还是我现在在 64 位机器上有 32 位 python?我相信这不是一个愚蠢的问题,因为很多时候您会根据处理器的不同而获得单独的软件包。

编辑:我强烈怀疑答案将与 sys.maxint 密切相关,它已从 python 3 中删除。我怀疑我应该 def xhash( x ): 如果 maxint 可用,则返回 hash( x ) - ( -maxint - 1 ) 。我知道由于整数和长整型的统一,这个值“失去了它的价值”,但这里可能是它仍然有用的一个领域。有人知道如何实现类似物吗?

in python (3), what is the smallest value that hash(x) can return?

i want to use hashes to give a quick 'fingerprint' to database values (basically making it easy to see whether two longish, similar texts are actually equal or not), and want to get rid of negative numbers (for simplicity), so i thought i'd just add the smallest possible value to obtain values of zero and up. the manual is very helpfully stating "Hash values are integers." which is about as much as i knew before.

i was a bit surprised today when i found that my hand-compiled python on a 64bit ubuntu apparently uses 64 bits or so for its hashing function; i have always thought that should be 32bit. does machine architecture have an impact on the hash() function?

also, when i compiled python, i did not set any option to compile for a 64bit architecture (hoping it would "just work"). does python adjust that by itself or do i now have a 32bit python on a 64bit machine? not a silly question i believe as many times you are offered separate packages depending on the processer.

edit: i strongly suspect the answer will be closely related to sys.maxint which has been sadly removed from python 3. my suspicion is that i should def xhash( x ): return hash( x ) - ( -maxint - 1 ) if maxint was available. i know this value 'lost its value' due to the unification of ints and longs, but here might be one area where it could still prove useful. anybody have an idea how to implement an analogue?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

无妨# 2024-10-05 13:30:26

hash() 可以返回任何整数,正如您所看到的,整数的大小可能因架构而异。这是字典排序任意的原因之一:两个不同平台上的同一组操作可能会给出不同的结果,因为一路上使用的哈希可能不同。

如果您所做的只是显示快速指纹的哈希值,那么只需保留部分位的子集即可。它作为哈希值仍然有效。哈希函数的唯一要求是相同的值必须具有相同的哈希值。之后,哈希值之间的差异只会影响使用哈希值的算法的效率,因为冲突的机会会上升或下降。

例如,您可以决定需要一个 8 位哈希值,并通过使用以下方式获取它:

hash(x) % 100000000

或者您可以使用以下命令获取八个字符的字母数字哈希值来显示:

md5(hash(x)).hexdigest()[:8]

hash() can return any integer, and as you have seen, the size of the integer can vary with the architecture. This is one of the reasons dictionary ordering is arbitrary: the same set of operations on two different platforms can give different results because the hashes used along the way can differ.

If all you are doing is showing a hash for a quick fingerprint, then simply keep a subset of the bits. It's still valid as a hash. The only requirement of a hash function is that equal values must have equal hashes. After that, differences among hashes simply affect the efficiency of the algorithms using the hash, because the chances of collision go up or down.

So for example, you could decide you want an 8-digit hash, and get it by using:

hash(x) % 100000000

Or you could get an eight-character alphanumeric hash to display with:

md5(hash(x)).hexdigest()[:8]
聆听风音 2024-10-05 13:30:26

哈希函数通常使用返回值的完整范围。原因是它们通常是用位运算(移位、异或等)构造的——返回值中的位都在算法过程中使用。

为什么积极的价值观比消极的价值观更容易或更难?

hash functions usually use the full range of the return value. The reason is that they usually are constructed with bit operations (shifting, xoring, etc) -- the bits in the return value are all used during the algorithm.

Why are positive values easier or harder than negative ones?

剧终人散尽 2024-10-05 13:30:26

你的问题的答案应该是:

assert(hash(100) == 100 and hash(-100) == -100)
smallest_hash_value= -2**min(range(256), key=lambda i: hash(-2**i))

这取决于Python使用整数本身作为散列(-1除外)当且仅当整数是有效的hash()< /代码> 结果。无论架构如何,算法通常应该保持不变。

The answer to your question should be:

assert(hash(100) == 100 and hash(-100) == -100)
smallest_hash_value= -2**min(range(256), key=lambda i: hash(-2**i))

This depends on the fact that Python uses the integer itself as a hash (with the exception of -1) iff the integer is a valid hash() result. The algorithm normally should remain the same whatever the architecture.

攒一口袋星星 2024-10-05 13:30:26

所以今天我在谷歌赌场比较幸运,这就是我发现的:

中找到

from platform import architecture
print( architecture() )

(1)系统架构给定的python是否运行在64位或32位机器上可以从文档 : “查询给定的可执行文件(默认为 Python 解释器二进制文件)的各种体系结构信息。返回一个元组(位、链接),其中包含有关位体系结构和用于可执行文件的链接格式的信息。两个值都以字符串形式返回。”在我的机器上,它是 ('64bit', 'ELF')。宾果游戏。

(2)最小整数在python 3中不再有sys.maxint,但是有sys.maxsize。文档说“一个整数,给出 Py_ssize_t 类型的变量可以采用的最大值。在 32 位平台上通常为 2**31 - 1,而 64 位平台上为 2**63 - 1。”因此,

from sys import maxsize
assert maxsize == 2**63 - 1

可以在我的机器上运行。

(3) 直接回答原来的问题:“hash() 函数的最小值应该减去sys.maxsize 报告的值。因此,可以预期

def xhash( x ): return hash( x ) + sys.maxsize + 1

只会报告 ≥ 0 的值。”

so today i was luckier at the google casino, and this is what i found:

(1) system architecture whether a given python is running on a 64 or a 32bit machine can be found by

from platform import architecture
print( architecture() )

from the documentation: "Queries the given executable (defaults to the Python interpreter binary) for various architecture information. Returns a tuple (bits, linkage) which contain information about the bit architecture and the linkage format used for the executable. Both values are returned as strings." on my machine, that's ('64bit', 'ELF'). bingo.

(2) smallest integer there is no sys.maxint in python 3 no more, but there is sys.maxsize. the docs say "An integer giving the maximum value a variable of type Py_ssize_t can take. It’s usually 2**31 - 1 on a 32-bit platform and 2**63 - 1 on a 64-bit platform." therefore,

from sys import maxsize
assert maxsize == 2**63 - 1

works on my machine.

(3) to directly answer the original question: "the smallest value of the hash() function should be minus whatever sys.maxsize reports. for this reason, it can be expected that

def xhash( x ): return hash( x ) + sys.maxsize + 1

will only ever report values ≥ 0."

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文