是否存在可以保证哈希算法唯一的情况?
如果我使用字节大小大于数据(例如 sha-256)的哈希算法对大小受限的类似数据(例如社会安全号码)进行哈希处理,哈希是否能保证与数据具有相同级别的唯一性?原始数据?
If I'm hashing size-constrained similar data (social security numbers, for example) using a hash algorithm with a larger byte size than the data (sha-256, for example), will the hash guarantee the same level of uniqueness as the original data?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
哈希冲突的概率与输入字符串的大小无关(除非它指示需要多少个输入来保持唯一性)。当您使用完美哈希算法对 0 和 1 进行哈希处理时,可能会发生哈希冲突,尽管可能性为 1/(2^位长度)。在 SHA-256 的情况下,其实际上为零。
哈希冲突是一个生日悖论问题。在 256 位哈希的情况下,两个输入之间发生冲突的概率完全取决于输入的数量,为:
The probability of a hash collision has nothing to do with the size of the input string (except to the extent that it indicates how many inputs you need to keep uniqueness among). It's possible to have a hash collision when you hash 0 and 1 using a perfect hash algorithm, although the possibility is 1/(2^bit-length). Which in the case of SHA-256 is effectively zero.
Hash collisions are a birthday paradox problem. In the case of a 256 bit hash, the probability of a collision among two inputs is purely dependent on the count of inputs and is:
您始终可以创建保证唯一性的自定义哈希。对于已知域(如 SSN)中的数据,练习相对简单。
如果您的目标哈希值实际上具有比您正在哈希的位数更多的可用位,则哈希只是将输入值映射到可用输出值之一。这将是从作为多字节整数的输入值到作为多字节整数的输出的简单线性映射。
当您的目标哈希值的位数少于正在哈希的位数时,就无法保证唯一性。
You can always create a customized hash that guarantees uniqueness. For data in a known domain (like SSN's), the exercise is relatively simple.
If your target hash value actually has more bits available than what you're hashing, the hash simply maps input values to one of the available output values. This will be a simple linear mapping from input value as a multi-byte integer to the output as a multi-byte integer.
When your target hash value has fewer bits than what's being hashed, then uniqueness cannot ever be guaranteed.
其他人则指出,碰撞不应成为问题;这就是加密安全哈希函数的全部要点。我想添加以下内容:
Others have pointed out that collisions should not be a concern; that is the whole point of cryptographically secure hash functions. I would just like to add the following:
如果您使用的是 SHA 之类的加密哈希,那么简短的答案是肯定的。
If you're using a cryptographic hash like SHA, then the short answer is yes.
加密安全哈希函数的一个关键特性是,毫无疑问,您不会受到冲突的影响,无论输入如何。这对于比输出大小短的输入也有效,这与熵很小的较长消息相同。因此您可以使用 SHA-2 而不必担心冲突。
One key feature of a cryptographically secure hash function is that you are safe from collisions beyond reasonable doubt, regardless of the input. This is also valid for input shorter than the output's size, which is the same of a longer message with little entropy. So you can use SHA-2 without worrying about collisions.