GUID 不唯一的简单证明

发布于 2024-08-10 09:56:50 字数 332 浏览 4 评论 0原文

我想证明 GUID 在简单的测试程序中不是唯一的。 我预计以下代码会运行几个小时,但它不起作用。我怎样才能让它发挥作用?

BigInteger begin = new BigInteger((long)0);
BigInteger end = new BigInteger("340282366920938463463374607431768211456",10);  //2^128
for(begin; begin<end; begin++)
  Console.WriteLine(System.Guid.NewGuid().ToString());

我正在使用 C#。

I'd like to prove that a GUID is not unique in a simple test program.
I expected the following code to run for hours, but it's not working. How can I make it work?

BigInteger begin = new BigInteger((long)0);
BigInteger end = new BigInteger("340282366920938463463374607431768211456",10);  //2^128
for(begin; begin<end; begin++)
  Console.WriteLine(System.Guid.NewGuid().ToString());

I'm using C#.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(30

却一份温柔 2024-08-17 09:56:51

这也是一个解决方案:

int main()
{
  QUuid uuid;
  while ( (uuid = QUuid::createUuid()) != QUuid::createUuid() ) { }
  std::cout << "Aha! I've found one! " << qPrintable( uuid.toString() ) << std::endl;
}

注意:需要 Qt,但我保证如果你让它运行足够长的时间,它可能会找到一个。

(注意:实际上,现在我正在查看它,生成算法可能会阻止两个随后生成的 uuid 发生冲突 - 但我有点怀疑)。

Here's a solution, too:

int main()
{
  QUuid uuid;
  while ( (uuid = QUuid::createUuid()) != QUuid::createUuid() ) { }
  std::cout << "Aha! I've found one! " << qPrintable( uuid.toString() ) << std::endl;
}

Note: requires Qt, but I guarantee that if you let it run long enough, it might find one.

(Note note: actually, now that I'm looking at it, there may be something about the generation algorithm that prevents two subsequently generated uuids that collide--but I kinda doubt it).

霊感 2024-08-17 09:56:51

证明 GUID 不唯一的唯一解决方案是拥有世界 GUID 池。每次在某处生成 GUID 时,都应将其注册到组织。或者,我们可能会包含一个标准化,所有 GUID 生成器都需要自动注册它,为此它需要有效的互联网连接!

The only solution to prove GUIDs are not unique would be by having a World GUID Pool. Each time a GUID is generated somewhere, it should be registered to the organization. Or heck, we might include a standardization that all GUID generators needs to register it automatically and for that it needs an active internet connection!

羞稚 2024-08-17 09:56:50

Kai,我提供了一个程序,可以使用线程执行您想要的操作。它是根据以下条款获得许可的:对于运行它的每个 CPU 核心,您必须每小时向我支付 0.0001 美元。费用应在每个日历月末支付。请在您方便的时候尽早与我联系以获取我的贝宝帐户详细信息。

using System;
using System.Collections.Generic;
using System.Linq;

namespace GuidCollisionDetector
{
    class Program
    {
        static void Main(string[] args)
        {
            //var reserveSomeRam = new byte[1024 * 1024 * 100];     // This indeed has no effect.

            Console.WriteLine("{0:u} - Building a bigHeapOGuids.", DateTime.Now);
            // Fill up memory with guids.
            var bigHeapOGuids = new HashSet<Guid>();
            try
            {
                do
                {
                    bigHeapOGuids.Add(Guid.NewGuid());
                } while (true);
            }
            catch (OutOfMemoryException)
            {
                // Release the ram we allocated up front.
                // Actually, these are pointless too.
                //GC.KeepAlive(reserveSomeRam);
                //GC.Collect();
            }
            Console.WriteLine("{0:u} - Built bigHeapOGuids, contains {1} of them.", DateTime.Now, bigHeapOGuids.LongCount());


            // Spool up some threads to keep checking if there's a match.
            // Keep running until the heat death of the universe.
            for (long k = 0; k < Int64.MaxValue; k++)
            {
                for (long j = 0; j < Int64.MaxValue; j++)
                {
                    Console.WriteLine("{0:u} - Looking for collisions with {1} thread(s)....", DateTime.Now, Environment.ProcessorCount);
                    System.Threading.Tasks.Parallel.For(0, Int32.MaxValue, (i) =>
                    {
                        if (bigHeapOGuids.Contains(Guid.NewGuid()))
                            throw new ApplicationException("Guids collided! Oh my gosh!");
                    }
                    );
                    Console.WriteLine("{0:u} - That was another {1} attempts without a collision.", DateTime.Now, ((long)Int32.MaxValue) * Environment.ProcessorCount);
                }
            }
            Console.WriteLine("Umm... why hasn't the universe ended yet?");
        }
    }
}

PS:我想尝试并行扩展库。那很容易。

使用 OutOfMemoryException 作为控制流感觉是错误的。

编辑

嗯,看来这仍然能吸引选票。所以我解决了 GC.KeepAlive() 问题。并将其更改为使用 C# 4 运行。

并澄清我的支持条款:仅在 2010 年 2 月 28 日提供支持。请仅在当天使用时间机器提出支持请求。

编辑2
一如既往,GC 在管理内存方面比我做得更好;以前我自己做的任何尝试都注定会失败。

Kai, I have provided a program that will do what you want using threads. It is licensed under the following terms: you must pay me $0.0001 per hour per CPU core you run it on. Fees are payable at the end of each calendar month. Please contact me for my paypal account details at your earliest convenience.

using System;
using System.Collections.Generic;
using System.Linq;

namespace GuidCollisionDetector
{
    class Program
    {
        static void Main(string[] args)
        {
            //var reserveSomeRam = new byte[1024 * 1024 * 100];     // This indeed has no effect.

            Console.WriteLine("{0:u} - Building a bigHeapOGuids.", DateTime.Now);
            // Fill up memory with guids.
            var bigHeapOGuids = new HashSet<Guid>();
            try
            {
                do
                {
                    bigHeapOGuids.Add(Guid.NewGuid());
                } while (true);
            }
            catch (OutOfMemoryException)
            {
                // Release the ram we allocated up front.
                // Actually, these are pointless too.
                //GC.KeepAlive(reserveSomeRam);
                //GC.Collect();
            }
            Console.WriteLine("{0:u} - Built bigHeapOGuids, contains {1} of them.", DateTime.Now, bigHeapOGuids.LongCount());


            // Spool up some threads to keep checking if there's a match.
            // Keep running until the heat death of the universe.
            for (long k = 0; k < Int64.MaxValue; k++)
            {
                for (long j = 0; j < Int64.MaxValue; j++)
                {
                    Console.WriteLine("{0:u} - Looking for collisions with {1} thread(s)....", DateTime.Now, Environment.ProcessorCount);
                    System.Threading.Tasks.Parallel.For(0, Int32.MaxValue, (i) =>
                    {
                        if (bigHeapOGuids.Contains(Guid.NewGuid()))
                            throw new ApplicationException("Guids collided! Oh my gosh!");
                    }
                    );
                    Console.WriteLine("{0:u} - That was another {1} attempts without a collision.", DateTime.Now, ((long)Int32.MaxValue) * Environment.ProcessorCount);
                }
            }
            Console.WriteLine("Umm... why hasn't the universe ended yet?");
        }
    }
}

PS: I wanted to try out the Parallel extensions library. That was easy.

And using OutOfMemoryException as control flow just feels wrong.

EDIT

Well, it seems this still attracts votes. So I've fixed the GC.KeepAlive() issue. And changed it to run with C# 4.

And to clarify my support terms: support is only available on the 28/Feb/2010. Please use a time machine to make support requests on that day only.

EDIT 2
As always, the GC does a better job than I do at managing memory; any previous attempts at doing it myself were doomed to failure.

南街女流氓 2024-08-17 09:56:50

这将运行几个小时以上。假设它以 1 GHz 循环(事实并非如此,它会比这个慢很多),它将运行 10790283070806014188970 年。这大约是宇宙年龄的830亿倍。

假设摩尔定律成立,不运行这个程序会快很多,等几百数年,并在速度快数十亿倍的计算机上运行。事实上,任何运行时间超过 CPU 速度加倍所需时间(大约 18 个月)的程序,如果您等到 CPU 速度增加并购买新的 CPU 后再运行它,那么它会更快完成(除非您编写的程序是这样的)可以在新硬件上暂停和恢复)。

This will run for a lot more than hours. Assuming it loops at 1 GHz (which it won't - it will be a lot slower than that), it will run for 10790283070806014188970 years. Which is about 83 billion times longer than the age of the universe.

Assuming Moores law holds, it would be a lot quicker to not run this program, wait several hundred years and run it on a computer that is billions of times faster. In fact, any program that takes longer to run than it takes CPU speeds to double (about 18 months) will complete sooner if you wait until the CPU speeds have increased and buy a new CPU before running it (unless you write it so that it can be suspended and resumed on new hardware).

困倦 2024-08-17 09:56:50

GUID 理论上是不唯一的。这是你的证明:

  • GUID 是一个 128 位数字
  • 如果不重新使用旧的 GUID,则无法生成 2^128 + 1 或更多 GUID

但是,如果太阳的全部功率输出都用于执行此任务,那么它很早就会变冷它完成了。

GUID 可以使用多种不同的策略生成,其中一些策略采取特殊措施来保证给定的计算机不会两次生成相同的 GUID。在特定算法中查找冲突将表明您生成 GUID 的特定方法很糟糕,但通常不会证明有关 GUID 的任何信息。

A GUID is theoretically non-unique. Here's your proof:

  • GUID is a 128 bit number
  • You cannot generate 2^128 + 1 or more GUIDs without re-using old GUIDs

However, if the entire power output of the sun was directed at performing this task, it would go cold long before it finished.

GUIDs can be generated using a number of different tactics, some of which take special measures to guarantee that a given machine will not generate the same GUID twice. Finding collisions in a particular algorithm would show that your particular method for generating GUIDs is bad, but would not prove anything about GUIDs in general.

甜`诱少女 2024-08-17 09:56:50

当然 GUID 可能会发生冲突。由于 GUID 是 128 位,因此只需生成其中的 2^128 + 1 并通过

但是,当我们说 GUID 是唯一的时,我们真正的意思是密钥空间太大,实际上不可能意外地生成相同的 GUID 两次(假设我们随机生成 GUID)。

如果随机生成一系列 n GUID,则至少发生一次冲突的概率约为 p(n) = 1 - exp(-n^2 / 2 * 2^128) (这是生日问题,可能的生日数量为 2^128)。

   n     p(n)
2^30 1.69e-21
2^40 1.77e-15
2^50 1.86e-10
2^60 1.95e-03

为了使这些数字具体化,2^60 = 1.15e+18。因此,如果每秒生成 10 亿个 GUID,则需要 36 年才能生成 2^60 随机 GUID,即使如此,发生冲突的概率仍然是 1.95e-03。您更有可能在生命中的某个时刻被谋杀 (4.76e-03) 比您在未来 36 年内发现碰撞的时间更长。祝你好运。

Of course GUIDs can collide. Since GUIDs are 128-bits, just generate 2^128 + 1 of them and by the pigeonhole principle there must be a collision.

But when we say that a GUID is a unique, what we really mean is that the key space is so large that it is practically impossible to accidentally generate the same GUID twice (assuming that we are generating GUIDs randomly).

If you generate a sequence of n GUIDs randomly, then the probability of at least one collision is approximately p(n) = 1 - exp(-n^2 / 2 * 2^128) (this is the birthday problem with the number of possible birthdays being 2^128).

   n     p(n)
2^30 1.69e-21
2^40 1.77e-15
2^50 1.86e-10
2^60 1.95e-03

To make these numbers concrete, 2^60 = 1.15e+18. So, if you generate one billion GUIDs per second, it will take you 36 years to generate 2^60 random GUIDs and even then the probability that you have a collision is still 1.95e-03. You're more likely to be murdered at some point in your life (4.76e-03) than you are to find a collision over the next 36 years. Good luck.

牵你的手,一向走下去 2024-08-17 09:56:50

如果您担心唯一性,您可以随时购买新的 GUID,这样您就可以扔掉旧的 GUID。如果你愿意的话,我会把一些放在 eBay 上。

If you're worried about uniqueness you can always purchase new GUIDs so you can throw away your old ones. I'll put some up on eBay if you'd like.

苍白女子 2024-08-17 09:56:50

就我个人而言,我认为“大爆炸”是由两个 GUID 碰撞引起的。

Personally, I think the "Big Bang" was caused when two GUIDs collided.

荒芜了季节 2024-08-17 09:56:50

您可以使用 Quantum bogosort 算法的变体在 O(1) 时间内证明这一点。

Guid g1 = Guid.NewGuid();
Guid g2 = Guid.NewGuid();
if(g1 != g2) Universe.Current.Destroy();

You can show that in O(1) time with a variant of the quantum bogosort algorithm.

Guid g1 = Guid.NewGuid();
Guid g2 = Guid.NewGuid();
if(g1 != g2) Universe.Current.Destroy();
耀眼的星火 2024-08-17 09:56:50

任何两个 GUID 很可能是唯一的(不相等)。

请参阅此SO条目,以及来自维基百科

虽然每个生成的 GUID 不是
保证是唯一的,总数
唯一键的数量(2^128 或
3.4×10^38) 如此之大以至于相同数字的概率
生成两次的值非常小。为了
例如,考虑可观察的
宇宙,大约包含 5×10^22
星星;每个明星都可以拥有
6.8×10^15 通用唯一 GUID。

因此,你可能还需要等待数十亿年,并希望你能在我们所知的宇宙终结之前到达这一点。

Any two GUIDs are very likely unique (not equal).

See this SO entry, and from Wikipedia

While each generated GUID is not
guaranteed to be unique, the total
number of unique keys (2^128 or
3.4×10^38) is so large that the probability of the same number being
generated twice is very small. For
example, consider the observable
universe, which contains about 5×10^22
stars; every star could then have
6.8×10^15 universally unique GUIDs.

So probably you have to wait for many more billion of years, and hope that you hit one before the universe as we know it comes to an end.

把时间冻结 2024-08-17 09:56:50

[更新:] 正如下面的评论所指出的,较新的 MS GUID 是 V4,并且不使用 MAC 地址作为 GUID 生成的一部分(我没有看到任何 V5 的迹象)不过,微软的实施,所以如果有人有确认链接,请告诉我)。但对于 V4,时间仍然是一个因素,而且 GUID 重复的可能性仍然很小,与任何实际用途无关。您当然不可能仅通过单个系统测试(例如 OP 尝试执行的操作)生成重复的 GUID。

这些答案中的大多数都缺少有关 Microsoft GUID 实现的一个重要点。 GUID 的第一部分基于时间戳,另一部分基于网卡的 MAC 地址(如果未安装 NIC,则为随机数)。

如果我理解正确的话,这意味着复制 GUID 的唯一可靠方法是在 MAC 地址相同且两个系统上的时钟在生成时处于同一精确时间的多台计算机上同时运行 GUID 生成。发生了(如果我理解正确的话,时间戳是基于毫秒的)……即使如此,数字中还有很多其他位是随机的,所以几率仍然很小。

出于所有实际目的,GUID 都是唯一的。

"对 MS GUID 有很好的描述老新事物”博客

[Update:] As the comments below point out, newer MS GUIDs are V4 and do not use the MAC address as part of the GUID generation (I haven't seen any indication of a V5 implementation from MS though, so if anyone has a link confirming that let me know). WIth V4 though, time is still a factor though, and the odds against duplication of GUIDs remains so small as to be irrelevant for any practical usage. You certainly would not be likely to ever generate a duplicate GUID from just a single system test such as the OP was trying to do.

Most of these answers are missing one vital point about Microsoft's GUID implementation. The first part of the GUID is based on a timestamp and another part is based on the MAC address of the network card (or a random number if no NIC is installed).

If I understand this correctly, it means that the only reliable way to duplicate a GUID would be to run simultainous GUID generations on multiple machines where the MAC addresses were the same AND where the clocks on both systems were at the same exact time when the generation occured (the timestamp is based on milliseconds if I understand it correctly).... even then there are a lot of other bits in the number that are random, so the odds are still vanishingly small.

For all practical purposes the GUIDs are universally unique.

There is a pretty good description of the MS GUID over at "The Old New Thing" blog

沩ん囻菔务 2024-08-17 09:56:50

如果您想在代码中的许多地方检查 guid 的唯一性,可以使用以下一个漂亮的小扩展方法。

internal static class GuidExt
{
    public static bool IsUnique(this Guid guid)
    {
        while (guid != Guid.NewGuid())
        { }
        return false;
    }
}

要调用它,只需在生成新的 guid 时调用 Guid.IsUnique...

Guid g = Guid.NewGuid();
if (!g.IsUnique())
{
    throw new GuidIsNotUniqueException();
}

...哎呀,我什至建议调用它两次以确保它在第一轮中正确。

Here's a nifty little extension method that you can use if you want to check guid uniqueness in many places in your code.

internal static class GuidExt
{
    public static bool IsUnique(this Guid guid)
    {
        while (guid != Guid.NewGuid())
        { }
        return false;
    }
}

To call it, simply call Guid.IsUnique whenever you generate a new guid...

Guid g = Guid.NewGuid();
if (!g.IsUnique())
{
    throw new GuidIsNotUniqueException();
}

...heck, I'd even recommend calling it twice to make sure it got it right in the first round.

凉城 2024-08-17 09:56:50

数到 2^128 - 雄心勃勃。

让我们想象一下,每台机器每秒可以计算 2^32 个 ID - 并不是那么雄心勃勃,因为每秒甚至不到 43 亿个。让我们专门使用 2^32 台机器来完成该任务。此外,让 2^32 个文明为每个任务分配相同的资源。

到目前为止,我们每秒可以计算 2^96 个 ID,这意味着我们将计算 2^32 秒(略多于 136 年)。

现在,我们需要的只是让 4,294,967,296 个文明为每个文明提供 4,294,967,296 台机器,每台机器每秒能够计算 4,294,967,296 个 ID,纯粹用于未来 136 年左右的这项任务 - 我建议我们现在就开始这项重要任务; -)

Counting to 2^128 - ambitious.

Lets imagine that we can count 2^32 IDs per second per machine - not that ambitious, since it's not even 4.3 billion per second. Lets dedicate 2^32 machines to that task. Furthermore, lets get 2^32 civilisations to each dedicate the same resources to the task.

So far, we can count 2^96 IDs per second, meaning we will be counting for 2^32 seconds (a little over 136 years).

Now, all we need is to get 4,294,967,296 civilisations to each dedicate 4,294,967,296 machines, each machine capable of counting 4,294,967,296 IDs per second, purely to this task for the next 136 years or so - I suggest we get started on this essential task right now ;-)

人│生佛魔见 2024-08-17 09:56:50

好吧,如果 830 亿年的运行时间没有吓到您,那么您还需要将生成的 GUID 存储在某个地方,以检查是否有重复项;存储 2^128 16 字节的数字只需要您预先分配 4951760157141521099596496896 TB 的 RAM,因此想象一下您有一台可以容纳所有这些的计算机,并且您以某种方式找到一个地方以每个 10 克的价格购买 TB 的 DIMM,将它们组合起来重量超过 8 个地球质量,因此您可以在按下“运行”之前认真地将其移离当前轨道。三思而后行!

Well if the running time of 83 billion years does not scare you, think that you will also need to store the generated GUIDs somewhere to check if you have a duplicate; storing 2^128 16-byte numbers would only require you to allocate 4951760157141521099596496896 terabytes of RAM upfront, so imagining you have a computer which could fit all that and that you somehow find a place to buy terabyte DIMMs at 10 grams each, combined they will weigh more than 8 Earth masses, so you can seriously shift it off the current orbit, before you even press "Run". Think twice!

春风十里 2024-08-17 09:56:50
for(begin; begin<end; begin)
    Console.WriteLine(System.Guid.NewGuid().ToString());

您没有递增 begin,因此条件 begin begin begin begin begin begin begin begin begin begin begin end 始终为真。

for(begin; begin<end; begin)
    Console.WriteLine(System.Guid.NewGuid().ToString());

You aren't incrementing begin so the condition begin < end is always true.

你的他你的她 2024-08-17 09:56:50

如果 GUID 冲突是一个问题,我建议使用 ScottGuID 代替。

If GUID collisions are a concern, I would recommend using the ScottGuID instead.

回眸一遍 2024-08-17 09:56:50

想必您有理由相信生成 Guid 的算法并没有生成真正的随机数,而是实际上以周期 << 循环。 2^128。

例如,用于导出 GUID 的 RFC4122 方法固定了某些位的值。

循环的证明将取决于周期的可能大小。

对于小周期,哈希表的 hash(GUID) -> GUID 碰撞时替换
如果 GUID 不匹配(如果匹配则终止)可能是一种方法。还可以考虑仅在随机的一段时间内进行替换。

最终,如果碰撞之间的最大周期足够大(并且事先不知道),则任何方法都只会产生碰撞存在时被发现的概率。

请注意,如果生成 Guid 的方法是基于时钟的(请参阅 RFC),则可能无法确定是否存在冲突,因为 (a) 您将无法等待足够长的时间让时钟绕回,或者 (b) 您无法在一个时钟周期内请求足够的 Guid 来强制发生碰撞。

或者,您也许能够显示 Guid 中的位之间的统计关系,或者 Guid 之间的位的相关性。这种关系可能会使算法很可能存在缺陷,但不一定能够找到实际的碰撞。

当然,如果你只是想证明Guids可以碰撞,那么答案就是数学证明,而不是程序。

Presumably you have reason to be believe that the algorithm for producing Guids is not producing truly random numbers, but is in fact cycling with a period << 2^128.

e.g. RFC4122 method used to derive GUIDs which fixes the values of some bits.

Proof of cycling is going to depend upon the possible size of the period.

For small periods, hash table of hash(GUID) -> GUID with replacement on collision
if GUIDs do not match (terminate if they do) might be an approach. Consider also only doing the replacement a random fraction of the time.

Ultimately if the maximum period between collisions is large enough (and isn't known in advance) any method is only going to yield a probability that the collision would be found if it existed.

Note that if the method of generating Guids is clock based (see the RFC), then it may not be possible to determine if collisions exist because either (a) you won't be able to wait long enough for the clock to wrap round, or (b) you can't request enough Guids within a clock tick to force a collision.

Alternatively you might be able to show a statistical relationship between the bits in the Guid, or a correlation of bits between Guids. Such a relationship might make it highly probable that the algorithm is flawed without necessarily being able to find an actual collision.

Of course, if you just want to prove that Guids can collide, then a mathematical proof, not a program, is the answer.

烦人精 2024-08-17 09:56:50

我不明白为什么没有人提到升级你的显卡...当然,如果你有高端 NVIDIA Quadro FX 4800 或其他东西(192 个 CUDA 核心),这会更快...

当然,如果你能买得起几个 NVIDIA Qadro Plex 2200 S4(每个 960 个 CUDA 核心),这个计算会真的尖叫。也许 NVIDIA 愿意借给您一些用于“技术演示”作为公关噱头?

他们肯定想成为这个历史计算的一部分......

I don't understand why no one has mentioned upgrading your graphics card... Surely if you got a high-end NVIDIA Quadro FX 4800 or something (192 CUDA cores) this would go faster...

Of course if you could afford a few NVIDIA Qadro Plex 2200 S4s (at 960 CUDA cores each), this calculation would really scream. Perhaps NVIDIA would be willing to loan you a few for a "Technology Demonstration" as a PR stunt?

Surely they'd want to be part of this historic calculation...

银河中√捞星星 2024-08-17 09:56:50

但是您是否必须确保您有重复项,或者您是否只关心是否可能有重复项。为了确保有两个人生日相同,需要 366 个人(不包括闰年)。要使两个人生日相同的可能性超过 50%,您只需要 23 个人。这就是生日问题

如果您有 32 位,则只需要 77,163 个值就有大于 50% 的重复机会。尝试一下:

Random baseRandom = new Random(0);

int DuplicateIntegerTest(int interations)
{
    Random r = new Random(baseRandom.Next());
    int[] ints = new int[interations];
    for (int i = 0; i < ints.Length; i++)
    {
        ints[i] = r.Next();
    }
    Array.Sort(ints);
    for (int i = 1; i < ints.Length; i++)
    {
        if (ints[i] == ints[i - 1])
            return 1;
    }
    return 0;
}

void DoTest()
{
    baseRandom = new Random(0);
    int count = 0;
    int duplicates = 0;
    for (int i = 0; i < 1000; i++)
    {
        count++;
        duplicates += DuplicateIntegerTest(77163);
    }
    Console.WriteLine("{0} iterations had {1} with duplicates", count, duplicates);
}

1000 iterations had 737 with duplicates

现在 128 位已经很多了,所以你仍然在谈论大量的项目,但碰撞的可能性仍然很小。使用近似值,对于给定的赔率,您需要以下数量的记录:

  • 8 亿,表示发生碰撞的可能性为 1/1000
  • 217 亿,表示发生碰撞的可能性为 50%
  • 396 亿,表示发生碰撞的可能性为 90% 每年大约发送 1E14 封电子邮件,因此

在这个水平上大约需要 400,000 年,您才有 90% 的机会收到两封具有相同 GUID 的电子邮件,但这与说您需要运行计算机有很大不同 83宇宙年龄的十亿倍,或者太阳在找到复制品之前会变冷。

But do you have to be sure you have a duplicate, or do you only care if there can be a duplicate. To be sure that you have two people with the same birthday, you need 366 people (not counting leap year). For there to be a greater than 50% chance of having two people with the same birthday you only need 23 people. That's the birthday problem.

If you have 32 bits, you only need 77,163 values to have a greater than 50% chance of a duplicate. Try it out:

Random baseRandom = new Random(0);

int DuplicateIntegerTest(int interations)
{
    Random r = new Random(baseRandom.Next());
    int[] ints = new int[interations];
    for (int i = 0; i < ints.Length; i++)
    {
        ints[i] = r.Next();
    }
    Array.Sort(ints);
    for (int i = 1; i < ints.Length; i++)
    {
        if (ints[i] == ints[i - 1])
            return 1;
    }
    return 0;
}

void DoTest()
{
    baseRandom = new Random(0);
    int count = 0;
    int duplicates = 0;
    for (int i = 0; i < 1000; i++)
    {
        count++;
        duplicates += DuplicateIntegerTest(77163);
    }
    Console.WriteLine("{0} iterations had {1} with duplicates", count, duplicates);
}

1000 iterations had 737 with duplicates

Now 128 bits is a lot, so you are still talking a large number of items still giving you a low chance of collision. You would need the following number of records for the given odds using an approximation:

  • 0.8 billion billion for a 1/1000 chance of a collision occurring
  • 21.7 billion billion for 50% chance of a collision occurring
  • 39.6 billion billion for 90% chance of a collision occurring

There are about 1E14 emails sent per year so it would be about 400,000 years at this level before you would have a 90% chance of having two with the same GUID, but that is a lot different than saying you need to run a computer 83 billion times the age of the universe or that the sun would go cold before finding a duplicate.

要走干脆点 2024-08-17 09:56:50

你们没有错过一个要点吗?

我认为 GUID 是使用两个东西生成的,这使得它们成为全球唯一的机会非常高。一是它们使用您所在计算机的 MAC 地址作为种子,二是它们使用生成时间加上随机数。

因此,除非您在实际机器上运行它并在机器用于表示 GUID 中的时间的最短时间内运行所有猜测,否则无论您使用系统调用进行多少次猜测,都将永远不会生成相同的数字。

我想如果您知道 GUID 的实际制作方式,实际上会大大缩短猜测时间。

托尼

Aren't you all missing a major point?

I thought GUIDs were generated using two things which make the chances of them being Globally unique quite high. One is they are seeded with the MAC address of the machine that you are on and two they use the time that they were generated plus a random number.

So unless you run it on the actual machine and run all you guesses within the smallest amount of time that the machine uses to represent a time in the GUID you will never generate the same number no matter how many guesses you take using the system call.

I guess if you know the actual way a GUID is made would actually shorten the time to guess quite substantially.

Tony

白衬杉格子梦 2024-08-17 09:56:50

您可以散列 GUID。这样,您应该可以更快地获得结果。

哦,当然,同时运行多个线程也是一个好主意,这样您就可以增加竞争条件在不同线程上两次生成相同 GUID 的机会。

You could hash the GUIDs. That way, you should get a result much faster.

Oh, of course, running multiple threads at the same time is also a good idea, that way you'll increase the chance of a race condition generating the same GUID twice on different threads.

甜味超标? 2024-08-17 09:56:50

GUID 为 124 位,因为 4 位保存版本号。

GUIDs are 124 bits because 4 bits hold the version number.

神经大条 2024-08-17 09:56:50
  1. 前往纽约市的低温实验室。
  2. 将自己冻结(大约)1990 年。
  3. 在 Planet Express 找到一份工作。
  4. 购买全新的CPU。搭建一台计算机,运行程序,用末日机器之类的伪永动机将其放置在安全的地方。
  5. 等到时间机器发明出来。
  6. 使用时间机器跳转到未来。如果您购买的是 1YHz 128 位 CPU,则开始运行程序后转到3,938,453,320 天 20 小时 15 分 38 秒 463 ms 463 μs 374 ns 607 ps
  7. ...?
  8. 利润!!!

...即使您有 1YHz CPU,即 1,000,000,000,000,000(或 1,125,899,906,842,624,如果您喜欢使用二进制前缀),也至少需要 10,783,127 年比 1GHz CPU 快数倍。

因此,与其等待计算完成,不如去喂养失去家园的鸽子,因为其他n鸽子夺走了它们的家。 :(

或者,你可以等到 128 位量子计算机发明。然后你可以通过在合理的时间(也许)使用你的程序来证明 GUID 不是唯一的。

  1. Go to the cryogenics lab in the New York City.
  2. Freeze yourself for (roughly) 1990 years.
  3. Get a job at Planet Express.
  4. Buy a brand-new CPU. Build a computer, run the program, and place it in the safe place with an pseudo-perpetual motion machine like the doomsday machine.
  5. Wait until the time machine is invented.
  6. Jump to the future using the time machine. If you bought 1YHz 128bit CPU, go to 3,938,453,320 days 20 hours 15 minutes 38 seconds 463 ms 463 μs 374 ns 607 ps after when you started to run the program.
  7. ...?
  8. PROFIT!!!

... It takes at least 10,783,127 years even if you had 1YHz CPU which is 1,000,000,000,000,000 (or 1,125,899,906,842,624 if you prefer to use binary prefix) times faster than 1GHz CPU.

So rather than waiting for the compute finished, it would be better to feed pigeons which lost their home because other n pigeons took their home. :(

Or, you can wait until 128-bit quantum computer is invented. Then you may prove that GUID is not unique, by using your program in reasonable time(maybe).

酒儿 2024-08-17 09:56:50

您是否尝试过 begin = begin + new BigInteger((long)1) 代替 begin++ ?

Have you tried begin = begin + new BigInteger((long)1) in place of begin++?

飘落散花 2024-08-17 09:56:50

如果生成的 UUID 数量遵循摩尔定律,那么在可预见的将来 GUID 永远不会耗尽的印象是错误的。

有了 2 ^ 128 个 UUID,只需要 18 个月 * Log2(2^128) ~= 192 年,我们就会用完所有 UUID。

我相信(没有任何统计证据)自从大规模采用 UUID 以来的过去几年中,我们生成 UUID 的速度增长速度比摩尔定律规定的要快得多。换句话说,我们距离应对 UUID 危机的时间可能还不到 192 年,这比宇宙末日要早得多。

但由于到 2012 年底我们肯定不会耗尽它们,因此我们将把这个问题留给其他物种来解决。

If the number of UUID being generated follows Moore's law, the impression of never running out of GUID in the foreseeable future is false.

With 2 ^ 128 UUIDs, it will only take 18 months * Log2(2^128) ~= 192 years, before we run out of all UUIDs.

And I believe (with no statistical proof what-so-ever) in the past few years since mass adoption of UUID, the speed we are generating UUID is increasing way faster than Moore's law dictates. In other words, we probably have less than 192 years until we have to deal with UUID crisis, that's a lot sooner than end of universe.

But since we definitely won't be running them out by the end of 2012, we'll leave it to other species to worry about the problem.

囍孤女 2024-08-17 09:56:50

GUID 生成代码中出现错误的几率远高于算法生成冲突的几率。测试 GUID 的代码中出现错误的可能性更大。放弃。

The odds of a bug in the GUID generating code are much higher than the odds of the algorithm generating a collision. The odds of a bug in your code to test the GUIDs is even greater. Give up.

错爱 2024-08-17 09:56:50

该程序尽管有错误,但证明 GUID 不是唯一的。那些试图证明相反观点的人没有抓住要点。这个声明只是证明了一些 GUID 变体的弱实现。

GUID 不一定是唯一的,但它是高度唯一的。您刚刚精炼了高度的含义。根据版本、实现者(MS 或其他)、VM 的使用等,您的定义会发生很大的变化。 (请参阅之前帖子中的链接)

您可以缩短 128 位表来证明您的观点。最好的解决方案是使用哈希公式来缩短包含重复项的表,然后在哈希冲突时使用完整值,并基于此重新生成 GUID。如果从不同的位置运行,您将把哈希/完整密钥对存储在一个中央位置。

Ps:如果目标只是生成 x 个不同的值,则创建一个此宽度的哈希表并仅检查哈希值。

The program, albeit its errors, shows proof that a GUID is not unique. Those that try to prove the contrary are missing the point. This statement just proves the weak implementation of some of the GUID variations.

A GUID is not necessary unique by definition, it is highly unique by definition. You just refined the meaning of highly. Depending on the version, the implementator (MS or others), use of VM's, etc your definition of highly changes. (see link in earlier post)

You can shorten your 128 bit table to prove your point. The best solution is to use a hash formula to shorten your table with duplicates, and then use the full value once the hash collides and based on that re-generate a GUID. If running from different locations, you would be storing your hash/full key pairs in a central location.

Ps: If the goal is just to generate x number of different values, create a hash table of this width and just check on the hash value.

白鸥掠海 2024-08-17 09:56:50

不是在这里的篝火上胡闹,但它确实发生了,是的,我理解你一直在给这个家伙开的玩笑,但 GUID 仅在原则上是唯一的,我遇到这个线程是因为有一个错误在 WP7 模拟器中,这意味着每次启动时,它都会在第一次调用时给出相同的 GUID!因此,理论上不会发生冲突,如果生成所述 GUI 时出现问题,那么您可以获得重复的

http://forums.create.msdn.com/forums/p/92086/597310.aspx#597310

Not to p**s on the bonfire here, but it does actually happen, and yes, I understand the joking you have been giving this guy, but the GUID is unique only in principle, I bumped into this thread because there is a bug in the WP7 emulator which means every time it boots it gives out the SAME GUID the first time it is called! So, where in theory you cannot have a conflict, if there is a problem generating said GUI, then you can get duplicates

http://forums.create.msdn.com/forums/p/92086/597310.aspx#597310

孤檠 2024-08-17 09:56:50

由于 Guid 生成的一部分基于当前计算机的时间,因此我获取重复 Guid 的理论是:

  1. 执行 Windows 的全新安装
  2. 创建一个启动脚本,将时间重置为 2010-01-01 12:00:00 就像 Windows 一样启动。
  3. 启动脚本之后,它会触发您的应用程序生成 Guid。
  4. 克隆此 Windows 安装,以便排除后续启动中可能出现的任何细微差异。
  5. 使用此映像重新映像硬盘驱动器并启动计算机几次。

Since part of Guid generation is based on the current machine's time, my theory to get a duplicate Guid is:

  1. Perform a clean installation of Windows
  2. Create a startup script that resets the time to 2010-01-01 12:00:00 just as Windows boots up.
  3. Just after the startup script, it triggers your application to generate a Guid.
  4. Clone this Windows installation, so that you rule out any subtle differences that may occur in subsequent boot-ups.
  5. Re-image the hard drive with this image and boot-up the machine a few times.
世界如花海般美丽 2024-08-17 09:56:50

对我来说..单个核心生成 UUIDv1 所需的时间保证它是唯一的。即使在多核情况下,如果 UUID 生成器一次只允许为您的特定资源生成一个 UUID(请记住,多个资源可以完全利用相同的 UUID,但不太可能,因为资源本质上是地址的一部分),那么您将有足够多的 UUID 供您使用,直到时间戳耗尽为止。在这一点上我真的怀疑你会关心。

For me.. the time it takes for a single core to generate a UUIDv1 guarantees it will be unique. Even in a multi core situation if the UUID generator only allows one UUID to be generated at a time for your specific resource (keep in mind that multiple resources can totally utilize the same UUIDs however unlikely since the resource inherently part of the address) then you will have more than enough UUIDs to last you until the timestamp burns out. At which point I really doubt you would care.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文