GUID 的高效数据结构
我正在寻找一种数据结构,它使我能够快速(最好是 O(1) 快速)确定给定的 GUID 是否是 GUID 集合的成员。
我当前的方法是使用 TDictionary 以 0 作为值。
虽然这工作很快,但使用 Hashmap 重新哈希 GUID(根据定义被认为是唯一的)并让 Dictionary 处理不需要的值似乎是一种浪费。
一定有更好的解决方案,但我找不到。你可以吗?
I am in search of a data structure which enables me to quickly (prefarably O(1)-quickly) determine if a given GUID is a member of a Collection of GUIDs or not.
My current approach is to use a TDictionary with 0 as values.
While this works quickly, it seems to be a waste to use a Hashmap to rehash a GUID, which is by defintion considered to be unique, and to have the Dictionary handle values which are unneeded.
There must be a better solution for this, but I can't find one. Can you?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
很少有数据结构提供 O(1) 访问。一个是数组,另一个是 HashMap(David 的回答),我只知道另外一个: Trie。下面是按位 Trie 的简单实现: 具有一些有趣的属性:
代码:
Very few data structures offer O(1) access. One's the Array, the other one's the HashMap (David's answer), and I only know one other: The Trie. Here follows a simple implementation of a bit-wise Trie: Has some interesting properties:
The code:
我认为你已经成功了 99%。
散列听起来是正确的解决方案。利用 GUID 特殊性的明显方法是提供您自己的哈希函数,该函数将构成 GUID 的 4 个 32 位整数组合成一个 32 位整数。我只是对 4 个整数进行异或运算。
我假设您正在使用 Generics.Collections.TDictionary。您可以通过将自定义比较器传递给构造函数来提供自己的哈希函数。我不担心存储备用值,我不认为它会以明显的方式影响性能。
我相信您将 GUID 存储为 128 位整数而不是字符串。
最后,我想到 GUID 的默认比较器可能确实已经以这种方式生成哈希代码。在进行任何更改之前值得检查一下。
编辑
默认哈希代码使用应用于二进制数据的 Bob Jenkins 哈希。异或会更快,但默认的哈希码似乎不会成为性能瓶颈。
换句话说,我认为
TDictionary
将完全满足您的需求。I think you are 99% of the way there.
Hashing sounds like the right solution. The obvious way to take advantage of the special nature of the GUID is to supply your own hash function which combines into a single 32 bit integer the 4 32 bit integers that make up a GUID. I'd just XOR the 4 integers.
I presume you are using Generics.Collections.TDictionary. You can supply your own hash function by passing a custom comparer to the constructor. I wouldn't worry about storing spare values, I don't think it will affect performance in a discernible way.
I trust that you are storing your GUIDs as 128 bit integers and not as strings.
Finally, it has occurred to me that the default comparer for a GUID might indeed already do the hash code generation this way. It's worth checking that out before making any changes.
EDIT
Default hash code uses Bob Jenkins hash applied to the binary data. An XOR would be faster, but the default hash code doesn't seem like it would be a performance bottleneck.
In other words, I think that
TDictionary<TGUID,Integer>
will serve your needs perfectly adequately.