唯一值的哈希=唯一哈希?

发布于 2024-08-30 23:51:09 字数 281 浏览 8 评论 0原文

理论上,散列唯一值会产生唯一值吗?

假设我有一个包含 2 列的数据库表:id 和 code。 id 是一个自增 int,code 是一个 varchar。如果我这样做……

$code = sha1($id);

然后将 $code 存储到与 $id 相同的行中。我的代码列也将是唯一的吗?

如果我附加当前时间呢?例如:

$code = sha1($id . time());

谢谢。

Theoretically does hashing a unique value yield a unique value?

Let's say I have a DB table with 2 columns: id and code. id is an auto-incrementing int and code is a varchar. If I do ...

$code = sha1($id);

... and then store $code into the same row as $id. Will my code column be unique as well?

What about if I append the current time? eg:

$code = sha1($id . time());

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

墨小沫ゞ 2024-09-06 23:51:09

一般来说,答案是否定的。显示起来很简单:SHA-1 有 2^160 个不同的输出 - 160 位,但是还有更多的输入(例如,有 2^320 个不同的 40 字节字符串,并且它们不能全部映射到独特的输出)。

给定足够的值子集,答案是可能的。这取决于确切的算法和子集的大小:如果可能的输入数量小于可能的输出数量,则这是可能的(但不保证)。在思考这个问题时,牢记生日悖论可能会有所帮助:碰撞不会随着输入数量线性增加。

In general, the answer is no. This is trivial to show: SHA-1 has 2^160 different outputs - 160 bits, but there are many more inputs that that (e.g., there are 2^320 different 40-byte strings, and they can't all map to a unique output).

Given a sufficient subset of values, the answer is maybe. It depends on the exact algorithm and the size of the subset: if the number of possible inputs is smaller than the number of possible outputs, then it is possible (but NOT guaranteed). When thinking about this, it may be helpful to keep the birthday paradox in mind: the probability of a collision does not increase linearly with the number of inputs.

小帐篷 2024-09-06 23:51:09

两个不同的值给出相同哈希值的可能性很小。虽然很小,但也不是不可能。

There is a small possibility that two different values give the same hash. Although very small, it's not unlikely.

若水微香 2024-09-06 23:51:09

这取决于哈希算法。但从理论上讲,除非散列与原始字符串完全相同,否则散列有可能不唯一。

值的哈希是原始值的压缩表示。通过删除信息片段来创建哈希,您将丢失其在域中唯一的部分信息,从而增加了该值不唯一的可能性。保证它唯一的唯一方法是使用原始值本身,这违背了哈希的目的。

It depends on the hashing algorithm. But theoretically, unless the hash is exactly the same as the original string there is a potential for the hash to not be unique.

A hash of a value is a condensed representation of the original value. By removing pieces of information to create the hash you are losing parts of what make it unique in the domain and therefore increasing the probability that the value will not be unique. The only way to guarantee that it will be unique is to use the original value itself which defeats the purpose of hashing.

彩扇题诗 2024-09-06 23:51:09

人们必须问一个问题,你为什么要这样做?如果您的数据库已经为您提供了唯一标识符,为什么还需要生成另一个唯一标识符?

您可能还希望考虑到,在 PHP 之外,许多数据库引擎将为您生成 UUID 样式的主键如果那是你所需要的。

这里的要点是,诸如 sha1() 之类的哈希算法不适用于此类工作;它们用于验证两个(可能很长)字符串输入是否相同。与相似但不完全相同的字符串发生碰撞的机会非常小,但与非常不同的字符串发生碰撞的机会却要高得多。

One has to ask the question, why you would want to do this? If your database is already providing you with a unique identifier why do you need to generate another unique identifier?

You may also wish to consider that outside of PHP many database engines will generate UUID style primary keys for you if that is what you require.

The point here is that hashing algorithms such as sha1() are not intended for this type of work; they are for verifying that two (potentially very long) string inputs are the same. The chance of a collision with a similar, but not exact string is very remote but the chance of a collision with very different strings becomes far higher.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文