使用不同算法的 UUID 冲突风险
我有一个数据库,其中有 2 个(或者可能是 3 个或 4 个)不同的应用程序正在插入信息。新信息具有 GUID/UUID 类型的 ID,但每个应用程序使用不同的算法来生成 ID。例如,一种是使用 NHibernate 的“guid.comb”,另一种是使用 SQLServer 的 NEWID(),另一种可能希望使用 .NET 的 Guid.NewGuid() 实现。
是否存在高于正常情况的 ID 冲突或重复风险?
谢谢!
I have a database where 2 (or maybe 3 or 4) different applications are inserting information. The new information has IDs of the type GUID/UUID, but each application is using a different algorithm to generate the IDs. For example, one is using the NHibernate's "guid.comb", other is using the SQLServer's NEWID(), other might want to use .NET's Guid.NewGuid() implementation.
Is there an above normal risk of ID collision or duplicates?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
碰撞的风险略有增加,但仍然微乎其微。请考虑:
Comb 和
NEWID
/NEWSEQUENTIALID
都包含精度低至几毫秒的时间戳†。因此,除非您在完全相同的时刻从所有这些不同的源生成大量 ID,否则 ID 发生冲突是完全不可能的。GUID 中不基于时间戳的部分可以被认为是随机的;大多数 GUID 算法将这些数字基于 PRNG。因此,其他 10 个字节左右发生冲突的可能性与您使用两个单独的随机数生成器并观察冲突的情况相同。
想一想 - PRNG 可以并且确实会重复数字,因此它们之间发生冲突的可能性并不比仅使用其中一个发生冲突的可能性高很多,即使它们使用略有不同的算法。这有点像每周玩相同的彩票号码与每周随机选择一组 - 无论哪种方式,中奖的几率都是完全相同的。
现在,请记住,当您使用 Guid.Comb 等算法时,您只有 10 位唯一符,相当于 1024 个单独的值。因此,如果您在相同的几毫秒内生成大量 GUID,您将会遇到冲突。但是,如果您以相当低的频率生成 GUID,那么同时使用多少种不同的算法并不重要,冲突的可能性实际上仍然不存在。
让您绝对确定的最佳方法是进行测试;让所有 2 或 3 个(或者无论您使用多少个)同时定期生成 GUID,并将它们写入日志文件,然后查看是否发生冲突(如果是,则有多少次)。这应该能让您很好地了解这在实践中有多安全。
PS 如果您使用 NHibernate 的梳状生成器为集群主键生成 GUID,请考虑使用
NEWSEQUENTIALID()
而不是NEWID()
- Comb 的全部目的是避免页面拆分,如果您有其他进程使用非顺序算法,您就无法实现这一点。您还应该使用Guid.NewGuid
更改任何代码以使用相同的 Comb 生成器 - NHibernate 中使用的实际 Comb 算法是 不复杂并且易于在您自己的域逻辑中复制。† 请注意,关于
NEWID
以及它是否包含时间戳似乎存在一些争议。无论如何,由于它基于 MAC 地址,因此可能值的范围比 V4 GUID 或 Comb 小得多。我建议坚持在数据库外部使用 Comb GUID 并在数据库内部使用NEWSEQUENTIALID
的进一步原因。The risk of collisions is elevated slightly but still vanishingly small. Consider that:
Both Comb and
NEWID
/NEWSEQUENTIALID
include a timestamp with precision down to a few ms†. Thus, unless you are generating a large number of IDs at the exact same moment time from all of these different sources, it is literally impossible for IDs to collide.The part of the GUID that isn't based on the timestamp can be thought of as random; most GUID algorithms base these digits on a PRNG. Thus, the likelihood of a collision between these other 10 bytes or so is on the same order as if you used two separate random number generators and watched for collisions.
Think about this for a moment - PRNGs can and do repeat numbers, so the likelihood of a collision between two of them isn't significantly higher than a collision using just one of them, even if they use slightly different algorithms. It's sort of like playing the same lottery numbers every week vs. picking a random set every week - the odds of winning are exactly the same either way.
Now, keep in mind that when you use an algorithm like Guid.Comb, you only have 10 bits of uniqueifier, which equates to 1024 separate values. So if you're generating a huge number of GUIDs within the same few milliseconds, you will get collisions. But if you generate GUIDs at a fairly low frequency, it doesn't really matter how many different algorithms you use at the same time, the likelihood of a collision is still practically nonexistent.
The best way for you to be absolutely certain is to run a test; have all 2 or 3 (or however many you use) generating GUIDs, at the same time, at regular intervals, and write them out to a log file, and see if you get collisions (and if so, how many). That should give you a good idea of how safe this is in practice.
P.S. If you're using NHibernate's comb generator to generate GUIDs for a clustered primary key, consider using
NEWSEQUENTIALID()
instead ofNEWID()
- the whole point of Comb is to avoid page splits, and you're not accomplishing that if you have other processes using non-sequential algorithms. You should also change any code usingGuid.NewGuid
to use the same Comb generator - the actual Comb algorithm used in NHibernate is not complicated and easy to duplicate in your own domain logic.† Note that there seems to be some dispute about
NEWID
, and whether or not it contains a timestamp. In any case, since it is based on the MAC address, the range of possible values is considerably smaller than a V4 GUID or a Comb. Further reason for me to recommend sticking to Comb GUIDs outside the database andNEWSEQUENTIALID
inside the database.是的,风险高于正常水平,因为所有这些都使用不同的“GUID”定义。 Guid.NewGuid() 是一个符合 RFC 的大多数随机 GUID,但 NEWSEQUENTIALID 是一个基于 MAC 地址和时间戳的重新排序(因此不符合 RFC)的 GUID,而 NHibernate 的梳 GUID 完全不同(基于随机性和时间戳) )。
您可能需要考虑仅对一种 GUID 实现进行标准化。我对所有应用程序使用我自己类型的梳理 GUID。 我的博客对所有这些都有简要描述GUID 的类型以及我自己的设计决策。
Yes, the risk is above normal, because all of these use different definitions of "GUID." Guid.NewGuid() is an RFC-compliant mostly-random GUID, but NEWSEQUENTIALID is a reordered (and therefore non-RFC-compliant) GUID based on MAC address and timestamp, and NHibernate's comb GUID is completely different (based on randomness and timestamp).
You may want to consider just standardizing on one GUID implementation. I use my own type of combed GUID for all my apps. My blog has brief descriptions of all these types of GUIDs along with design decisions for my own.