如何创建确定性指南
在我们的应用程序中,我们使用具有 Guid 值的属性创建 Xml 文件。该值需要在文件升级之间保持一致。因此,即使文件中的其他所有内容发生更改,属性的 guid 值也应保持不变。
一个明显的解决方案是创建一个静态字典,其中包含文件名和要使用的指南。然后每当我们生成文件时,我们都会在字典中查找文件名并使用相应的 guid。但这是不可行的,因为我们可能会扩展到 100 个文件,并且不想维护大量的 guid 列表。
所以另一种方法是根据文件的路径使Guid相同。由于我们的文件路径和应用程序目录结构是唯一的,因此该路径的 Guid 应该是唯一的。因此,每次我们运行升级时,文件都会根据其路径获得相同的 guid。我找到了一种很酷的方法来生成这样的 '确定性指南'(感谢 Elton Stoneman)。它基本上是这样做的:
private Guid GetDeterministicGuid(string input)
{
//use MD5 hash to get a 16-byte hash of the string:
MD5CryptoServiceProvider provider = new MD5CryptoServiceProvider();
byte[] inputBytes = Encoding.Default.GetBytes(input);
byte[] hashBytes = provider.ComputeHash(inputBytes);
//generate a guid from the hash:
Guid hashGuid = new Guid(hashBytes);
return hashGuid;
}
所以给定一个字符串,Guid 将始终是相同的。
还有其他方法或推荐的方法来做到这一点吗?该方法有什么优点或缺点?
In our application we are creating Xml files with an attribute that has a Guid value. This value needed to be consistent between file upgrades. So even if everything else in the file changes, the guid value for the attribute should remain the same.
One obvious solution was to create a static dictionary with the filename and the Guids to be used for them. Then whenever we generate the file, we look up the dictionary for the filename and use the corresponding guid. But this is not feasible because we might scale to 100's of files and didnt want to maintain big list of guids.
So another approach was to make the Guid the same based on the path of the file. Since our file paths and application directory structure are unique, the Guid should be unique for that path. So each time we run an upgrade, the file gets the same guid based on its path. I found one cool way to generate such 'Deterministic Guids' (Thanks Elton Stoneman). It basically does this:
private Guid GetDeterministicGuid(string input)
{
//use MD5 hash to get a 16-byte hash of the string:
MD5CryptoServiceProvider provider = new MD5CryptoServiceProvider();
byte[] inputBytes = Encoding.Default.GetBytes(input);
byte[] hashBytes = provider.ComputeHash(inputBytes);
//generate a guid from the hash:
Guid hashGuid = new Guid(hashBytes);
return hashGuid;
}
So given a string, the Guid will always be the same.
Are there any other approaches or recommended ways to doing this? What are the pros or cons of that method?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
正如 @bacar 所提到的, RFC 4122 §4.3 定义了一种创建名称的方法 -基于UUID。这样做的优点(相对于仅使用 MD5 哈希)是保证它们不会与非基于名称的 UUID 发生冲突,并且与其他基于名称的 UUID 发生冲突的可能性非常(非常)小。
.NET Framework 中没有原生支持创建这些内容,因此我创建了 实现该算法的 NGuid 包。它可以按如下方式使用:
为了进一步降低与其他 GUID 冲突的风险,您可以创建一个私有 GUID 用作命名空间 ID(而不是使用 RFC 中定义的 URL 命名空间 ID)。
As mentioned by @bacar, RFC 4122 §4.3 defines a way to create a name-based UUID. The advantage of doing this (over just using a MD5 hash) is that these are guaranteed not to collide with non-named-based UUIDs, and have a very (very) small possibility of collision with other name-based UUIDs.
There's no native support in the .NET Framework for creating these, so I created the NGuid package that implements the algorithm. It can be used as follows:
To reduce the risk of collisions with other GUIDs even further, you could create a private GUID to use as the namespace ID (instead of using the URL namespace ID defined in the RFC).
这会将任何字符串转换为 Guid,而无需导入外部程序集。
有很多更好的方法来生成唯一的 Guid,但这是一种将字符串数据密钥一致升级为 Guid 数据密钥的方法。
This will convert any string into a Guid without having to import an outside assembly.
There are much better ways to generate a unique Guid but this is a way to consistently upgrading a string data key to a Guid data key.
正如 Rob 提到的,您的方法不会生成 UUID,它会生成看起来像 UUID 的哈希值。
UUID 上的 RFC 4122 特别允许确定性(名称-基于)UUID - 版本 3 和 5(分别)使用 md5 和 SHA1。大多数人可能熟悉版本 4,它是随机的。 维基百科 很好地概述了这些版本。 (请注意,此处使用的“版本”一词似乎描述了 UUID 的“类型” - 版本 5 并不取代版本 4)。
似乎有一些库可用于生成版本 3/5 UUID,包括 python uuid 模块, boost.uuid (C++ )和 OSSP UUID。 (我没有寻找任何.net)
As Rob mentions, your method doesn't generate a UUID, it generates a hash that looks like a UUID.
The RFC 4122 on UUIDs specifically allows for deterministic (name-based) UUIDs - Versions 3 and 5 use md5 and SHA1(respectively). Most people are probably familiar with version 4, which is random. Wikipedia gives a good overview of the versions. (Note that the use of the word 'version' here seems to describe a 'type' of UUID - version 5 doesn't supercede version 4).
There seem to be a few libraries out there for generating version 3/5 UUIDs, including the python uuid module, boost.uuid (C++) and OSSP UUID. (I haven't looked for any .net ones)
您需要区分 Guid 类的实例和全局唯一的标识符。 “确定性 guid”实际上是一个哈希(如您对
provider.ComputeHash
的调用所证明的那样)。与通过Guid.NewGuid
创建的 Guid 相比,哈希发生冲突的可能性(两个不同的字符串发生相同的哈希)要高得多。因此,您的方法的问题在于您必须接受两个不同路径产生相同 GUID 的可能性。如果您需要一个对于任何给定路径字符串都是唯一的标识符,那么最简单的方法就是只需使用该字符串。如果您需要对用户隐藏该字符串,请加密它 - 您可以使用 ROT13 或更强大的东西...
尝试将非纯 GUID 的内容硬塞到 GUID 数据类型中可能会导致以及日后的维护问题...
You need to make a distinction between instances of the class
Guid
, and identifiers that are globally unique. A "deterministic guid" is actually a hash (as evidenced by your call toprovider.ComputeHash
). Hashes have a much higher chance of collisions (two different strings happening to produce the same hash) than Guid created viaGuid.NewGuid
.So the problem with your approach is that you will have to be ok with the possibility that two different paths will produce the same GUID. If you need an identifier that's unique for any given path string, then the easiest thing to do is just use the string. If you need the string to be obscured from your users, encrypt it - you can use ROT13 or something more powerful...
Attempting to shoehorn something that isn't a pure GUID into the GUID datatype could lead to maintenance problems in future...
MD5 很弱,我相信你可以用 SHA-1 做同样的事情并得到更好的结果。
顺便说一句,只是个人意见,将 md5 散列装扮成 GUID 并不能使它成为一个好的 GUID。 GUID 本质上是非确定性的。这感觉就像是一个骗子。为什么不直截了当地说它是输入的字符串呈现的哈希值。您可以使用这一行而不是新的引导行来做到这一点:
MD5 is weak, I believe you can do the same thing with SHA-1 and get better results.
BTW, just a personal opinion, dressing a md5 hash up as a GUID does not make it a good GUID. GUIDs by their very nature are non Deterministic. this feels like a cheat. Why not just call a spade a spade and just say its a string rendered hash of the input. you could do that by using this line, rather than the new guid line:
这是一个非常简单的解决方案,对于单元/集成测试之类的事情来说应该足够好了:
Here's a very simple solution that should be good enough for things like unit/integration tests:
ARM/bicep 实现了这样的方法: https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/bicep-functions-string#guid
虽然效率低下且受静态编译保护较少,但 ARM 函数可以从 .NET 代码调用:
Such a method is implemented by ARM/bicep: https://learn.microsoft.com/en-us/azure/azure-resource-manager/bicep/bicep-functions-string#guid
While inefficient and less protected by static compilation, ARM functions can be invoked from .NET code: