是否存在可以保证哈希算法唯一的情况?

发布于 2024-08-21 20:47:05 字数 90 浏览 9 评论 0原文

如果我使用字节大小大于数据(例如 sha-256)的哈希算法对大小受限的类似数据(例如社会安全号码)进行哈希处理,哈希是否能保证与数据具有相同级别的唯一性?原始数据?

If I'm hashing size-constrained similar data (social security numbers, for example) using a hash algorithm with a larger byte size than the data (sha-256, for example), will the hash guarantee the same level of uniqueness as the original data?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

阳光下的泡沫是彩色的 2024-08-28 20:47:05

哈希冲突的概率与输入字符串的大小无关(除非它指示需要多少个输入来保持唯一性)。当您使用完美哈希算法对 0 和 1 进行哈希处理时,可能会发生哈希冲突,尽管可能性为 1/(2^位长度)。在 SHA-256 的情况下,其实际上为零。

哈希冲突是一个生日悖论问题。在 256 位哈希的情况下,两个输入之间发生冲突的概率完全取决于输入的数量,为:

  • 1 - (2^256)! / ((2^256^inputcount) * (2^256-inputcount)!) 或者正如其他人所说 - 对于合理数量的输入来说基本上为零。

The probability of a hash collision has nothing to do with the size of the input string (except to the extent that it indicates how many inputs you need to keep uniqueness among). It's possible to have a hash collision when you hash 0 and 1 using a perfect hash algorithm, although the possibility is 1/(2^bit-length). Which in the case of SHA-256 is effectively zero.

Hash collisions are a birthday paradox problem. In the case of a 256 bit hash, the probability of a collision among two inputs is purely dependent on the count of inputs and is:

  • 1 - (2^256)! / ((2^256^inputcount) * (2^256-inputcount)!) or as others have said -- basically zero for reasonable numbers of inputs.
﹂绝世的画 2024-08-28 20:47:05

您始终可以创建保证唯一性的自定义哈希。对于已知域(如 SSN)中的数据,练习相对简单。

如果您的目标哈希值实际上具有比您正在哈希的位数更多的可用位,则哈希只是将输入值映射到可用输出值之一。这将是从作为多字节整数的输入值到作为多字节整数的输出的简单线性映射。

当您的目标哈希值的位数少于正在哈希的位数时,就无法保证唯一性。

You can always create a customized hash that guarantees uniqueness. For data in a known domain (like SSN's), the exercise is relatively simple.

If your target hash value actually has more bits available than what you're hashing, the hash simply maps input values to one of the available output values. This will be a simple linear mapping from input value as a multi-byte integer to the output as a multi-byte integer.

When your target hash value has fewer bits than what's being hashed, then uniqueness cannot ever be guaranteed.

倾其所爱 2024-08-28 20:47:05

其他人则指出,碰撞不应成为问题;这就是加密安全哈希函数的全部要点。我想添加以下内容:

  • 如果您的输入集足够小(例如数据是 SSN - 数量不到十亿),那么不存在冲突是可以验证的:只需彻底测试即可。
  • 如果输入集太大而无法彻底扫描,则预计无法证明不存在碰撞。好的哈希函数应该充当随机预言机,并且在随机预言机上,如果不进行详尽的尝试,就无法证明这样的属性。能够证明不存在碰撞可能看起来像是该函数的一个弱点。

Others have pointed out that collisions should not be a concern; that is the whole point of cryptographically secure hash functions. I would just like to add the following:

  • If your input set is small enough (e.g. data is SSN -- there are less than a billion of them), then the absence of collision is amenable to verification: just test it exhaustively.
  • If the input set is too big to be exhaustively scanned, then it is expected that the absence of collision cannot be proven. Good hash functions are expected to act as random oracles, and on a random oracle you cannot prove such a property without trying exhaustively. Being able to prove the absence of collision would suspiciously look like a weakness of the function.
蝶…霜飞 2024-08-28 20:47:05

如果您使用的是 SHA 之类的加密哈希,那么简短的答案是肯定的。

If you're using a cryptographic hash like SHA, then the short answer is yes.

離殇 2024-08-28 20:47:05

加密安全哈希函数的一个关键特性是,毫无疑问,您不会受到冲突的影响,无论输入如何。这对于比输出大小短的输入也有效,这与熵很小的较长消息相同。因此您可以使用 SHA-2 而不必担心冲突。

One key feature of a cryptographically secure hash function is that you are safe from collisions beyond reasonable doubt, regardless of the input. This is also valid for input shorter than the output's size, which is the same of a longer message with little entropy. So you can use SHA-2 without worrying about collisions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文