唯一值的哈希=唯一哈希?
理论上,散列唯一值会产生唯一值吗?
假设我有一个包含 2 列的数据库表:id 和 code。 id 是一个自增 int,code 是一个 varchar。如果我这样做……
$code = sha1($id);
然后将 $code 存储到与 $id 相同的行中。我的代码列也将是唯一的吗?
如果我附加当前时间呢?例如:
$code = sha1($id . time());
谢谢。
Theoretically does hashing a unique value yield a unique value?
Let's say I have a DB table with 2 columns: id and code. id is an auto-incrementing int and code is a varchar. If I do ...
$code = sha1($id);
... and then store $code into the same row as $id. Will my code column be unique as well?
What about if I append the current time? eg:
$code = sha1($id . time());
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
一般来说,答案是否定的。显示起来很简单:SHA-1 有 2^160 个不同的输出 - 160 位,但是还有更多的输入(例如,有 2^320 个不同的 40 字节字符串,并且它们不能全部映射到独特的输出)。
给定足够的值子集,答案是可能的。这取决于确切的算法和子集的大小:如果可能的输入数量小于可能的输出数量,则这是可能的(但不保证)。在思考这个问题时,牢记生日悖论可能会有所帮助:碰撞不会随着输入数量线性增加。
In general, the answer is no. This is trivial to show: SHA-1 has 2^160 different outputs - 160 bits, but there are many more inputs that that (e.g., there are 2^320 different 40-byte strings, and they can't all map to a unique output).
Given a sufficient subset of values, the answer is maybe. It depends on the exact algorithm and the size of the subset: if the number of possible inputs is smaller than the number of possible outputs, then it is possible (but NOT guaranteed). When thinking about this, it may be helpful to keep the birthday paradox in mind: the probability of a collision does not increase linearly with the number of inputs.
两个不同的值给出相同哈希值的可能性很小。虽然很小,但也不是不可能。
There is a small possibility that two different values give the same hash. Although very small, it's not unlikely.
这取决于哈希算法。但从理论上讲,除非散列与原始字符串完全相同,否则散列有可能不唯一。
值的哈希是原始值的压缩表示。通过删除信息片段来创建哈希,您将丢失其在域中唯一的部分信息,从而增加了该值不唯一的可能性。保证它唯一的唯一方法是使用原始值本身,这违背了哈希的目的。
It depends on the hashing algorithm. But theoretically, unless the hash is exactly the same as the original string there is a potential for the hash to not be unique.
A hash of a value is a condensed representation of the original value. By removing pieces of information to create the hash you are losing parts of what make it unique in the domain and therefore increasing the probability that the value will not be unique. The only way to guarantee that it will be unique is to use the original value itself which defeats the purpose of hashing.
人们必须问一个问题,你为什么要这样做?如果您的数据库已经为您提供了唯一标识符,为什么还需要生成另一个唯一标识符?
您可能还希望考虑到,在 PHP 之外,许多数据库引擎将为您生成 UUID 样式的主键如果那是你所需要的。
这里的要点是,诸如 sha1() 之类的哈希算法不适用于此类工作;它们用于验证两个(可能很长)字符串输入是否相同。与相似但不完全相同的字符串发生碰撞的机会非常小,但与非常不同的字符串发生碰撞的机会却要高得多。
One has to ask the question, why you would want to do this? If your database is already providing you with a unique identifier why do you need to generate another unique identifier?
You may also wish to consider that outside of PHP many database engines will generate UUID style primary keys for you if that is what you require.
The point here is that hashing algorithms such as sha1() are not intended for this type of work; they are for verifying that two (potentially very long) string inputs are the same. The chance of a collision with a similar, but not exact string is very remote but the chance of a collision with very different strings becomes far higher.