散列一组 GUID 的最快方法

发布于 2024-07-09 02:56:35 字数 294 浏览 7 评论 0原文

我有一个包含 n 个 GUID 的列表,我需要将它们散列成一个值。 该值可以是 Guid 对象的大小或 Int32 的大小,这并不重要,但它确实需要在统计上是唯一的(比如可能类似于 MD5)。

因此,一种方法可能是对它们进行排序,连接字节并采用所有字节的 MD5 哈希值......但这不是很快。

另一个想法:我注意到,在 .NET 中,将组合对象的 GetHashCode 方法实现为组合对象的哈希代码的 XOR 是相当标准的做法。 因此,对我的 GUID 列表进行异或在数学上是否合理?

欢迎任何想法!

I have a list of n GUIDs and I need to hash them into a single value. This value could be the size of a Guid object or the size of an Int32, it doesn't really matter, but it does need to be statistically unique (say with a probably similar to MD5).

So one approach could be to sort them, concatenate the bytes and take an MD5 hash of all the bytes... but this isn't very quick.

Another idea: I notice that it is fairly standard practice in .NET to implement the GetHashCode method of a composing object as the XOR of the hash codes of the composed objects. Therefore could it be mathematically sensible to XOR my list of GUIDs?

Any ideas welcome!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不美如何 2024-07-16 02:56:35

如果您希望散列对于集合有效(即顺序无关紧要),那么对每个 GUID 的散列码进行异或是一个不错的选择。

如果您实际上拥有 GUID 的序列并且顺序很重要,那么我建议使用我在 在另一个答案中 - 重复加/乘。

(请注意,对哈希码进行异或可能不会得到与对 GUID 本身进行异或然后对结果进行哈希处理相同的答案。可能是这样,但这取决于 GUID.GetHashCode() 的实现。我会对每个值进行哈希处理并将结果异或在一起 - 除了其他任何事情之外,实现起来很简单。)

If you want the hash to be valid for the set (i.e. order doesn't matter) then XORing the hashcode of each GUID is a good choice.

If you've actually got a sequence of GUIDs and the order matters then I'd suggest using the same approach I wrote about in another answer - repeatedly add/multiply.

(Note that XORing the hashcodes probably won't get you the same answer as XORing the GUIDs themselves and then hashing the result. It may be, but that depends on the implementation of GUID.GetHashCode(). I'd hash each value and XOR the results together - aside from anything else, that's trivial to implement.)

一抹微笑 2024-07-16 02:56:35

不要对 GUID 进行异或,然后对结果进行哈希处理。 与简单地异或 GUID 相比,您不会获得任何好处,除非您使用小于 GUID 的散列。

由于您似乎真的很关心这方面的性能,所以多一点信息会很有用——特别是,您是否使用内存中 GUID 的不同组合(这样您只能在创建它们时对其进行哈希一次),或者您正在加载它们并处理它们,并且不太可能出现重复的 GUID?

Don't XOR the GUIDs and then hash the result. You gain nothing this way over simply XORing the GUIDs, unless you use a hash smaller than a GUID.

Since you seem to really care about performance for this, a little more information would be useful -- in particular, are you using different combinations of the GUIDs you have in memory (so you could hash them only once as they're created), or are you loading them in and processing them, and repeated GUIDs are unlikely?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文