为什么对 UUID 进行 MD5 处理不是一个好主意?
PHP 有一个 uniqid() 函数,可以生成某种 UUID。
在用法示例中,它显示以下内容:
$token = md5(uniqid());
但在注释中,有人这么说:
从唯一 ID 生成 MD5 是 幼稚并降低了很多价值 唯一的ID,以及提供 显着的(可攻击的)限制 MD5 域。 那是一个深深的 破碎的事情要做。 正确的 方法是使用唯一的ID 它自己的; 它已经准备好了 无碰撞。
如果是的话,为什么这是真的? 如果 MD5 哈希对于唯一 ID 而言(几乎)是唯一的,那么对 uniqid 进行 md5 处理有什么问题呢?
PHP has a uniqid() function which generates a UUID of sorts.
In the usage examples, it shows the following:
$token = md5(uniqid());
But in the comments, someone says this:
Generating an MD5 from a unique ID is
naive and reduces much of the value of
unique IDs, as well as providing
significant (attackable) stricture on
the MD5 domain. That's a deeply
broken thing to do. The correct
approach is to use the unique ID on
its own; it's already geared for
non-collision.
Why is this true, if so? If an MD5 hash is (almost) unique for a unique ID, then what is wrong from md5'ing a uniqid?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
UUID 为 128 位宽,并且其生成方式具有固有的唯一性。 MD5 哈希值的宽度为 128 位,不保证唯一性,仅保证较低的冲突概率。 MD5 哈希不小于 UUID,因此它对存储没有帮助。
如果您知道哈希值来自 UUID,则攻击会更容易,因为如果您了解有关生成 UUID 的机器的任何信息,则有效 UUID 的域实际上是相当可预测的。
如果您需要提供安全令牌,那么您需要使用加密安全随机数生成器。 (1) UUID 并非旨在加密安全,仅保证唯一。 由唯一机器标识符(通常是 MAC)和时间限制的单调递增序列仍然是一个完全有效的 UUID,但如果您可以从令牌序列对单个 UUID 进行逆向工程,那么它是高度可预测的。
如果您了解数论,您可以找到从生成值序列猜测某些 PRNG 内部状态的方法。 Mersenne Twister 就是此类生成器的一个示例。 它具有隐藏状态,它曾经获得很长的周期,但它在加密上并不安全 - 您可以采用相当小的数字序列并使用它来推断内部状态。 完成此操作后,您可以使用它来攻击依赖于对该序列保密的加密机制。
A UUID is 128 bits wide and has uniqueness inherent to the way it is generated. A MD5 hash is 128 bits wide and doesn't guarantee uniquess, only a low probablity of collision. The MD5 hash is no smaller than the UUID so it doesn't help with storage.
If you know the hash is from a UUID it is much easier to attack because the domain of valid UUIDs is actually fairly predictable if you know anything about the machine geneerating them.
If you needed to provide a secure token then you would need to use a cryptographically secure random number generator.(1) UUIDs are not designed to be cryptographically secure, only guaranteed unique. A monotonically increasing sequence bounded by unique machine identifiers (typically a MAC) and time is still a perfectly valid UUID but highly predictable if you can reverse engineer a single UUID from the sequence of tokens.
If you get into number theory you can find ways to guess the internal state of some PRNGs from a sequence of generated values. Mersenne Twister is an example of such a generator. It has hidden state that it used to get its long period but it is not cryptographically secure - you can take a fairly small sequence of numbers and use that to infer the internal state. Once you've done this you can use it to attack a cryptographic mechanism that depends on keeping that sequence a secret.
请注意,
uniqid()
不会返回 UUID< /strong>,但是基于当前时间的“唯一”字符串:如果多次执行此操作,您将获得非常相似的输出字符串,并且熟悉
uniqid()
的每个人都会识别源算法。 这样就可以很容易地预测下一个将生成的 ID。md5() 的优点是输出以及特定于应用程序的盐字符串或随机数,是一种更难猜测字符串的方法:
与普通的
uniqid()
不同,这会产生非常不同的输出微秒。 此外,它不会揭示您的“前缀盐”字符串,也不会显示您在幕后使用uniqid()
。 在不知道盐的情况下,很难(认为不可能)猜测下一个 ID。总之,我不同意评论者的观点,并且总是更喜欢使用
md5()
编辑的输出而不是简单的uniqid()
。Note that
uniqid()
does not return a UUID, but a "unique" string based on the current time:If you do that multiple times, you will get very similar output strings and everyone who is familiar with
uniqid()
will recognize the source algorithm. That way it is pretty easy to predict the next IDs that will be generated.The advantage of md5()-ing the output, along with an application-specific salt string or random number, is a way harder to guess string:
Unlike plain
uniqid()
, this produces very different outputs every microsecond. Furthermore it does not reveil your "prefix salt" string, nor that you are usinguniqid()
under the hood. Without knowing the salt, it is very hard (consider it impossible) to guess the next ID.In summary, I would disagree with the commentor's opinion and would always prefer the
md5()
-ed output over plainuniqid()
.MD5 对 UUID 毫无意义,因为 UUID 已经是唯一的且固定长度(短),这些属性是人们经常使用 MD5 的一些原因。 所以我想这取决于您打算对 UUID 做什么,但一般来说,UUID 与某些经过 MD5 处理的数据具有相同的属性,那么为什么要同时进行这两种操作呢?
MD5ing a UUID is pointless because UUIDs are already unique and fixed length (short), properties that are some of the reasons that people often use MD5 to begin with. So I suppose it depends on what you plan on doing with the UUID, but in general a UUID has the same properties as some data that has been MD5'd, so why do both?
UUID 已经是唯一的,因此对它们进行 MD5 是没有意义的。
关于安全问题,一般来说,如果攻击者可以预测您将要生成的下一个唯一 ID 是什么,那么您可能会受到攻击。 如果已知您从 UUID 生成唯一 ID,则潜在的下一个唯一 ID 集会小得多,从而为暴力攻击提供了更好的机会。
如果攻击者可以从您那里获取大量唯一 ID,并通过这种方式猜测您生成 UUID 的方案,则尤其如此。
UUIDs are already unique, so there is no point in MD5'ing them anyway.
Regarding the security question, in general you can be attacked if the attacker can predict what the next unique ID will be you are about to generate. If it is known that you generate your unique IDs from UUIDs, the set of potential next unique IDs is much smaller, giving a better chance for a brute force attack.
This is especially true if the attacker can get a whole bunch of unique IDs from you, and that way guess your scheme of generating UUIDs.
版本 3 的 UUID 已经经过 MD5 处理,因此没有意义再做一次。 但是,我不确定 PHP 使用哪个 UUID 版本。
Version 3 of UUIDs are already MD5'd, so there's no point in doing it again. However, I'm not sure what UUID version PHP uses.
顺便说一句,MD5 实际上已过时,从 2010 年起不再用于任何值得保护的内容 - PHI、PII 或 PCI。 美联储已经强制执行了这一规定,任何不合规的实体都将支付大量的罚款。
As an aside, MD5 is actually obsolete and is not to be used in anything worth protecting - PHI, PII or PCI - from 2010 onwards. The US Feds have ennforced this and any entity non-compliant would be paying lots of $$$ in penalty.