Java的UUID.randomUUID有多好?

发布于 2024-08-27 05:33:19 字数 299 浏览 6 评论 0原文

我知道随机 UUID 理论上发生碰撞的概率非常非常低,但我我想知道,在实践中,Java 的 有多好randomUUID() 是指没有碰撞吗?有人有经验可以分享吗?

I know that randomized UUIDs have a very, very, very low probability for collision in theory, but I am wondering, in practice, how good Java's randomUUID() is in terms of not having collision? Does anybody have any experience to share?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

空宴 2024-09-03 05:33:19

UUID 使用 java.security.SecureRandom,这应该是“加密强度高”。虽然实际的实现没有指定,并且在 JVM 之间可能有所不同(这意味着所做的任何具体语句仅对一种特定的 JVM 有效),但它确实要求输出必须通过统计随机数生成器测试。

一个实现总是有可能包含破坏这一切的微妙错误(请参阅 OpenSSH 密钥生成错误),但我认为没有任何具体理由担心 Java UUID 的随机性。

UUID uses java.security.SecureRandom, which is supposed to be "cryptographically strong". While the actual implementation is not specified and can vary between JVMs (meaning that any concrete statements made are valid only for one specific JVM), it does mandate that the output must pass a statistical random number generator test.

It's always possible for an implementation to contain subtle bugs that ruin all this (see OpenSSH key generation bug) but I don't think there's any concrete reason to worry about Java UUIDs's randomness.

时常饿 2024-09-03 05:33:19

维基百科有一个很好的答案
http://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions

为了有 50% 的概率发生至少一次冲突,需要生成随机版本 4 UUID 的数量为 2.71 quintillion,计算如下:

...

这个数字相当于在大约 85 年里每秒生成 10 亿个 UUID,而包含这么多 UUID 的文件(每个 UUID 16 字节)将约为 45 艾字节,比当前存在的最大数据库大很多倍,数量级为数百 PB。

...

因此,为了保证十亿分之一的重复概率,必须生成 103 万亿个版本 4 UUID。

Wikipedia has a very good answer
http://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions

the number of random version 4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion, computed as follows:

...

This number is equivalent to generating 1 billion UUIDs per second for about 85 years, and a file containing this many UUIDs, at 16 bytes per UUID, would be about 45 exabytes, many times larger than the largest databases currently in existence, which are on the order of hundreds of petabytes.

...

Thus, for there to be a one in a billion chance of duplication, 103 trillion version 4 UUIDs must be generated.

挽清梦 2024-09-03 05:33:19

有人有经验可以分享吗?

类型 4 UUID 有 2^122 个可能的值。 (规范规定类型会丢失 2 位,版本号会丢失 4 位。)

假设您每秒生成 100 万个随机 UUID,那么在您的一生中发生重复的可能性将微乎其微。 。要检测重复项,您必须解决每秒将 100 万个新 UUID 与您之前生成的所有 UUID1 进行比较的问题!

任何人在现实生活中经历过(即实际注意到)重复的机会甚至比消失还要小……因为寻找碰撞的实际困难。

当然,现在您通常会使用伪随机数生成器,而不是真正的随机数源。但我认为我们可以确信,如果您使用可靠的提供商来提供加密强度随机数,那么它是加密强度,并且重复的概率将与理想情况相同(无偏)随机数生成器。

然而,如果您使用带有“损坏的”加密随机数生成器的 JVM,那么所有的赌注都将落空。 (这可能包括某些系统上“熵不足”问题的一些解决方法。或者有人在您的系统或上游修改了您的 JRE 的可能性。)


1 - 假设您使用了“某些正如匿名评论者提出的“一种二进制 btree”,假设低密度和随机,每个 UUID 将需要 O(NlogN) 位 RAM 内存来表示 N 个不同的 UUID位的分布。现在将其乘以 1,000,000 以及您要运行实验的秒数。我认为这对于测试高质量 RNG 碰撞所需的时间长度来说并不实际。即使有(假设的)巧妙的表述也不行。

Does anybody have any experience to share?

There are 2^122 possible values for a type-4 UUID. (The spec says that you lose 2 bits for the type, and a further 4 bits for a version number.)

Assuming that you were to generate 1 million random UUIDs a second, the chances of a duplicate occurring in your lifetime would be vanishingly small. And to detect the duplicate, you'd have to solve the problem of comparing 1 million new UUIDs per second against all of the UUIDs you have previously generated1!

The chances that anyone has experienced (i.e. actually noticed) a duplicate in real life are even smaller than vanishingly small ... because of the practical difficulty of looking for collisions.

Now of course, you will typically be using a pseudo-random number generator, not a source of truly random numbers. But I think we can be confident that if you are using a creditable provider for your cryptographic strength random numbers, then it will be cryptographic strength, and the probability of repeats will be the same as for an ideal (non-biased) random number generator.

However, if you were to use a JVM with a "broken" crypto- random number generator, all bets are off. (And that might include some of the workarounds for "shortage of entropy" problems on some systems. Or the possibility that someone has tinkered with your JRE, either on your system or upstream.)


1 - Assuming that you used "some kind of binary btree" as proposed by an anonymous commenter, each UUID is going to need O(NlogN) bits of RAM memory to represent N distinct UUIDs assuming low density and random distribution of the bits. Now multiply that by 1,000,000 and the number of seconds that you are going to run the experiment for. I don't think that is practical for the length of time needed to test for collisions of a high quality RNG. Not even with (hypothetical) clever representations.

孤凫 2024-09-03 05:33:19

我不是专家,但我认为多年来有足够多的聪明人研究过 Java 的随机数生成器。因此,我还假设随机 UUID 是好的。所以你应该确实有理论碰撞概率(大约为 1 : 3 × 10^38所有可能的 UUID。有谁知道这对于随机 UUID 有何变化?是上述的 1/(16*4) 吗?)

根据我的实践经验,到目前为止我从未见过任何冲突。 。当我得到第一个胡子的那天,我可能会长出惊人的长胡子;)

I'm not an expert, but I'd assume that enough smart people looked at Java's random number generator over the years. Hence, I'd also assume that random UUIDs are good. So you should really have the theoretical collision probability (which is about 1 : 3 × 10^38 for all possible UUIDs. Does anybody know how this changes for random UUIDs only? Is it 1/(16*4) of the above?)

From my practical experience, I've never seen any collisions so far. I'll probably have grown an astonishingly long beard the day I get my first one ;)

寒尘 2024-09-03 05:33:19

在前雇主那里,我们有一个包含随机 uuid 的独特列。部署后第一周我们就发生了碰撞。当然,几率很低,但也不是零。这就是 Log4j 2 包含 UuidUtil.getTimeBasedUuid 的原因。只要您在单个服务器上生成的 UUID 数不超过 10,000 个/毫秒,它就会生成一个 8,925 年唯一的 UUID。

At a former employer we had a unique column that contained a random uuid. We got a collision the first week after it was deployed. Sure, the odds are low but they aren't zero. That is why Log4j 2 contains UuidUtil.getTimeBasedUuid. It will generate a UUID that is unique for 8,925 years so long as you don't generate more than 10,000 UUIDs/millisecond on a single server.

乖乖 2024-09-03 05:33:19

许多答案都讨论了必须生成多少个 UUID 才能达到 50% 的冲突几率。但对于必须(几乎)不可能发生碰撞的应用程序来说,50%、25% 甚至 1% 的碰撞几率是毫无价值的。

程序员是否经常将其他可能发生且确实发生的事件视为“不可能”?

当我们将数据写入磁盘或内存并再次读回时,我们理所当然地认为数据是正确的。我们依靠设备的纠错来检测任何损坏。但未检测到错误的几率实际上约为 2-50

对随机 UUID 应用类似的标准难道没有意义吗?如果这样做,您会发现在大约 1000 亿个随机 UUID (236.5) 的集合中可能发生“不可能”的冲突。

这是一个天文数字,但国家医疗保健系统中的逐项计费或在大量设备上记录高频传感器数据等应用肯定会遇到这些限制。如果您正在编写下一篇银河系漫游指南,请不要尝试为每篇文章分配 UUID!

Many of the answers discuss how many UUIDs would have to be generated to reach a 50% chance of a collision. But a 50%, 25%, or even 1% chance of collision is worthless for an application where collision must be (virtually) impossible.

Do programmers routinely dismiss as "impossible" other events that can and do occur?

When we write data to a disk or memory and read it back again, we take for granted that the data are correct. We rely on the device's error correction to detect any corruption. But the chance of undetected errors is actually around 2-50.

Wouldn't it make sense to apply a similar standard to random UUIDs? If you do, you will find that an "impossible" collision is possible in a collection of around 100 billion random UUIDs (236.5).

This is an astronomical number, but applications like itemized billing in a national healthcare system, or logging high frequency sensor data on a large array of devices could definitely bump into these limits. If you are writing the next Hitchhiker's Guide to the Galaxy, don't try to assign UUIDs to each article!

墨洒年华 2024-09-03 05:33:19

UUID 最初的生成方案是将 UUID 版本与生成 UUID 的计算机的 MAC 地址以及自西方采用公历以来的 100 纳秒间隔数连接起来。通过表示空间(计算机)和时间(间隔数)中的单个点,值发生冲突的可能性实际上为零。

The original generation scheme for UUIDs was to concatenate the UUID version with the MAC address of the computer that is generating the UUID, and with the number of 100-nanosecond intervals since the adoption of the Gregorian calendar in the West. By representing a single point in space (the computer) and time (the number of intervals), the chance of a collision in values is effectively nil.

甜宝宝 2024-09-03 05:33:19

我去年买彩票,从来没有中过奖......
但似乎彩票有中奖者......

文档:https://www.rfc-editor .org/rfc/rfc4122

类型 1:未实现。如果 uuid 是同时生成的,则可能发生冲突。 impl 可以人为地a-synchronize 来绕过这个问题。

类型 2:从未见过实现。

类型 3:md5 哈希:可能发生冲突(128 位 - 2 个技术字节)

类型 4:随机:可能发生冲突(如彩票)。请注意,jdk6 impl 不使用“真正的”安全随机,因为 PRNG 算法不是由开发人员选择的,您可以强制系统使用“较差的”PRNG 算法。所以你的 UUID 是可以预测的。

类型 5:sha1 哈希:未实现:可能发生冲突(160 位 2 技术字节)

I play at lottery last year, and I've never won ....
but it seems that there lottery has winners ...

doc : https://www.rfc-editor.org/rfc/rfc4122

Type 1 : not implemented. collision are possible if the uuid is generated at the same moment. impl can be artificially a-synchronize in order to bypass this problem.

Type 2 : never see a implementation.

Type 3 : md5 hash : collision possible (128 bits-2 technical bytes)

Type 4 : random : collision possible (as lottery). note that the jdk6 impl dont use a "true" secure random because the PRNG algorithm is not choose by developer and you can force system to use a "poor" PRNG algo. So your UUID is predictable.

Type 5 : sha1 hash : not implemented : collision possible (160 bit-2 technical bytes)

撩起发的微风 2024-09-03 05:33:19

我们在应用程序中使用 Java 的随机 UUID 已经一年多了,而且非常广泛。但我们从未遇到过碰撞。

We have been using the Java's random UUID in our application for more than one year and that to very extensively. But we never come across of having collision.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文