我什么时候应该在 python 中使用 uuid.uuid1() 和 uuid.uuid4() ?

发布于 2024-08-12 08:22:21 字数 472 浏览 2 评论 0原文

我从 docs 中了解两者之间的差异。

uuid1() :
根据主机 ID、序列号和当前时间生成 UUID

uuid4()
生成随机 UUID。

因此,uuid1 使用机器/序列/时间信息来生成 UUID。使用每种方法的优点和缺点是什么?

我知道 uuid1() 可能存在隐私问题,因为它基于机器信息。我想知道选择其中之一时是否有更微妙的地方。我现在只使用 uuid4(),因为它是一个完全随机的 UUID。但我想知道是否应该使用 uuid1 来降低碰撞风险。

基本上,我正在寻找人们关于使用其中一种与另一种的最佳实践的提示。谢谢!

I understand the differences between the two from the docs.

uuid1():
Generate a UUID from a host ID, sequence number, and the current time

uuid4():
Generate a random UUID.

So uuid1 uses machine/sequence/time info to generate a UUID. What are the pros and cons of using each?

I know uuid1() can have privacy concerns, since it's based off of machine-information. I wonder if there's any more subtle when choosing one or the other. I just use uuid4() right now, since it's a completely random UUID. But I wonder if I should be using uuid1 to lessen the risk of collisions.

Basically, I'm looking for people's tips for best-practices on using one vs. the other. Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

×眷恋的温暖 2024-08-19 08:22:21

uuid1() 保证不会产生任何冲突(假设您没有同时创建太多冲突)。如果 uuid 和计算机之间没有连接很重要,我不会使用它,因为 MAC 地址被用来使其在计算机之间保持唯一。

您可以通过在 100 纳秒内创建超过 214 uuid1 来创建重复项,但这对于大多数用例来说不是问题。

uuid4() 正如您所说,生成一个随机 UUID。碰撞的可能性真的非常非常小。非常。足够小,你不用担心。问题是,糟糕的随机数生成器更容易发生冲突。

这很棒Bob Aman 的回答很好地总结了这一点。 (我建议阅读整个答案。)

坦率地说,在单个应用程序空间中
如果没有恶意行为者,
地球上所有生命都会灭绝
早在你有一个之前就发生了
冲突,即使是在版本 4 UUID 上,
即使你产生了很多
每秒 UUID 数。

uuid1() is guaranteed to not produce any collisions (under the assumption you do not create too many of them at the same time). I wouldn't use it if it's important that there's no connection between the uuid and the computer, as the mac address gets used to make it unique across computers.

You can create duplicates by creating more than 214 uuid1 in less than 100ns, but this is not a problem for most use cases.

uuid4() generates, as you said, a random UUID. The chance of a collision is really, really, really small. Small enough, that you shouldn't worry about it. The problem is, that a bad random-number generator makes it more likely to have collisions.

This excellent answer by Bob Aman sums it up nicely. (I recommend reading the whole answer.)

Frankly, in a single application space
without malicious actors, the
extinction of all life on earth will
occur long before you have a
collision, even on a version 4 UUID,
even if you're generating quite a few
UUIDs per second.

慕巷 2024-08-19 08:22:21

我的团队刚刚在使用 UUID1 进行数据库升级脚本时遇到了麻烦,我们在几分钟内生成了约 120k UUID。 UUID 冲突导致违反主键约束。

我们已经升级了数百台服务器,但在我们的 Amazon EC2 实例上,我们多次遇到此问题。我怀疑时钟分辨率不佳并切换到 UUID4 为我们解决了这个问题。

My team just ran into trouble using UUID1 for a database upgrade script where we generated ~120k UUIDs within a couple of minutes. The UUID collision led to violation of a primary key constraint.

We've upgraded 100s of servers but on our Amazon EC2 instances we ran into this issue a few times. I suspect poor clock resolution and switching to UUID4 solved it for us.

猫弦 2024-08-19 08:22:21

您可能会考虑 uuid1() 而不是 uuid4() 的一个实例是在不同的计算机上生成 UUID,例如当多个在线交易时为了扩展的目的,在多台机器上进行处理。

在这种情况下,例如,由于伪随机数生成器初始化方式选择不当而导致发生冲突的风险,并且生成的 UUID 数量可能较多,因此更有可能创建重复的 ID。

在这种情况下,uuid1() 的另一个好处是,最初生成每个 GUID 的机器被隐式记录(在 UUID 的“节点”部分中)。如果仅用于调试,此信息和时间信息可能会有所帮助。

One instance when you may consider uuid1() rather than uuid4() is when UUIDs are produced on separate machines, for example when multiple online transactions are process on several machines for scaling purposes.

In such a situation, the risks of having collisions due to poor choices in the way the pseudo-random number generators are initialized, for example, and also the potentially higher numbers of UUIDs produced render more likely the possibility of creating duplicate IDs.

Another interest of uuid1(), in that case is that the machine where each GUID was initially produced is implicitly recorded (in the "node" part of UUID). This and the time info, may help if only with debugging.

蘸点软妹酱 2024-08-19 08:22:21

使用 uuid1 时需要注意的一件事是,如果您使用默认调用(不提供 clock_seq 参数),您就有可能遇到冲突:您只有 14 位随机性(在 100 纳秒内生成 18 个条目,发生碰撞的几率大约为 1%,参见生日悖论/攻击)。在大多数用例中,这个问题永远不会发生,但在时钟分辨率较差的虚拟机上,它会咬住你。

One thing to note when using uuid1, if you use the default call (without giving clock_seq parameter) you have a chance of running into collisions: you have only 14 bit of randomness (generating 18 entries within 100ns gives you roughly 1% chance of a collision see birthday paradox/attack). The problem will never occur in most use cases, but on a virtual machine with poor clock resolution it will bite you.

不喜欢何必死缠烂打 2024-08-19 08:22:21

也许没有提到的是地方性。

MAC 地址或基于时间的排序 (UUID1) 可以提高数据库性能,因为与随机分布的数字 (UUID4) 相比,对更紧密的数字进行排序的工作量更少(请参阅 此处)。

第二个相关问题是,即使原始数据丢失或未显式存储,使用 UUID1 在调试中也很有用(这显然与 OP 提到的隐私问题相冲突)。

Perhaps something that's not been mentioned is that of locality.

A MAC address or time-based ordering (UUID1) can afford increased database performance, since it's less work to sort numbers closer-together than those distributed randomly (UUID4) (see here).

A second related issue, is that using UUID1 can be useful in debugging, even if origin data is lost or not explicitly stored (this is obviously in conflict with the privacy issue mentioned by the OP).

呆头 2024-08-19 08:22:21

除了已接受的答案之外,还有第三个选项在某些情况下很有用:

带有随机 MAC 的 v1(“v1mc”)

您可以在 v1 和 v1 之间进行混合。 v4 通过故意使用随机广播 MAC 地址生成 v1 UUID(这是 v1 规范允许的)。生成的 v1 UUID 与时间相关(如常规 v1),但缺少所有特定于主机的信息(如 v4)。它的抗碰撞性也更接近 v4:v1mc = 60 位时间 + 61 个随机位 = 121 个唯一位; v4 = 122 个随机位。

我遇到的第一个地方是 Postgres 的 uuid_generate_v1mc() 函数。我已经使用了以下Python等效项:

from os import urandom
from uuid import uuid1
_int_from_bytes = int.from_bytes  # py3 only

def uuid1mc():
    # NOTE: The constant here is required by the UUIDv1 spec...
    return uuid1(_int_from_bytes(urandom(6), "big") | 0x010000000000)

(注意:我有一个更长+更快的版本,可以直接创建UUID对象;如果有人想要的话可​​以发布)


其次,这有可能耗尽系统的随机性。您可以使用 stdlib random 模块来代替(它可能也会更快)。但请注意:攻击者只需要几百个 UUID 就可以确定 RNG 状态,从而部分预测未来的 UUID。

import random
from uuid import uuid1

def uuid1mc_insecure():
    return uuid1(random.getrandbits(48) | 0x010000000000)

In addition to the accepted answer, there's a third option that can be useful in some cases:

v1 with random MAC ("v1mc")

You can make a hybrid between v1 & v4 by deliberately generating v1 UUIDs with a random broadcast MAC address (this is allowed by the v1 spec). The resulting v1 UUID is time dependant (like regular v1), but lacks all host-specific information (like v4). It's also much closer to v4 in it's collision-resistance: v1mc = 60 bits of time + 61 random bits = 121 unique bits; v4 = 122 random bits.

First place I encountered this was Postgres' uuid_generate_v1mc() function. I've since used the following python equivalent:

from os import urandom
from uuid import uuid1
_int_from_bytes = int.from_bytes  # py3 only

def uuid1mc():
    # NOTE: The constant here is required by the UUIDv1 spec...
    return uuid1(_int_from_bytes(urandom(6), "big") | 0x010000000000)

(note: I've got a longer + faster version that creates the UUID object directly; can post if anyone wants)


In case of LARGE volumes of calls/second, this has the potential to exhaust system randomness. You could use the stdlib random module instead (it will probably also be faster). But BE WARNED: it only takes a few hundred UUIDs before an attacker can determine the RNG state, and thus partially predict future UUIDs.

import random
from uuid import uuid1

def uuid1mc_insecure():
    return uuid1(random.getrandbits(48) | 0x010000000000)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文