为 BitArray 生成良好的哈希码 (GetHashCode)
我需要在 GetHashCode 中为 BitArray 生成快速哈希码。我有一个字典,其中键是 BitArray,并且所有 BitArray 的长度都相同。
有谁知道一种从可变位数生成良好哈希的快速方法,就像在这种情况下一样?
更新:
我最初采用的方法是直接通过反射访问内部整数数组(在这种情况下速度比封装更重要),然后对这些值进行异或。 XOR 方法似乎工作得很好,即在字典中搜索时,我的“Equals”方法没有被过度调用:
public int GetHashCode(BitArray array)
{
int hash = 0;
foreach (int value in array.GetInternalValues())
{
hash ^= value;
}
return hash;
}
但是,Mark Byers 建议的方法以及 StackOverflow 上其他地方看到的方法稍好一些(16570 Equals 调用 vs 16608 的 XOR 调用)我的测试数据)。请注意,此方法修复了前一种方法中的一个错误,即超出位数组末尾的位可能会影响哈希值。如果减少位数组的长度,则可能会发生这种情况。
public int GetHashCode(BitArray array)
{
UInt32 hash = 17;
int bitsRemaining = array.Length;
foreach (int value in array.GetInternalValues())
{
UInt32 cleanValue = (UInt32)value;
if (bitsRemaining < 32)
{
//clear any bits that are beyond the end of the array
int bitsToWipe = 32 - bitsRemaining;
cleanValue <<= bitsToWipe;
cleanValue >>= bitsToWipe;
}
hash = hash * 23 + cleanValue;
bitsRemaining -= 32;
}
return (int)hash;
}
GetInternalValues 扩展方法的实现如下:
public static class BitArrayExtensions
{
static FieldInfo _internalArrayGetter = GetInternalArrayGetter();
static FieldInfo GetInternalArrayGetter()
{
return typeof(BitArray).GetField("m_array", BindingFlags.NonPublic | BindingFlags.Instance);
}
static int[] GetInternalArray(BitArray array)
{
return (int[])_internalArrayGetter.GetValue(array);
}
public static IEnumerable<int> GetInternalValues(this BitArray array)
{
return GetInternalArray(array);
}
... more extension methods
}
欢迎提出任何改进建议!
I need to generate a fast hash code in GetHashCode for a BitArray. I have a Dictionary where the keys are BitArrays, and all the BitArrays are of the same length.
Does anyone know of a fast way to generate a good hash from a variable number of bits, as in this scenario?
UPDATE:
The approach I originally took was to access the internal array of ints directly through reflection (speed is more important than encapsulation in this case), then XOR those values. The XOR approach seems to work well i.e. my 'Equals' method isn't called excessively when searching in the Dictionary:
public int GetHashCode(BitArray array)
{
int hash = 0;
foreach (int value in array.GetInternalValues())
{
hash ^= value;
}
return hash;
}
However, the approach suggested by Mark Byers and seen elsewhere on StackOverflow was slightly better (16570 Equals calls vs 16608 for the XOR for my test data). Note that this approach fixes a bug in the previous one where bits beyond the end of the bit array could affect the hash value. This could happen if the bit array was reduced in length.
public int GetHashCode(BitArray array)
{
UInt32 hash = 17;
int bitsRemaining = array.Length;
foreach (int value in array.GetInternalValues())
{
UInt32 cleanValue = (UInt32)value;
if (bitsRemaining < 32)
{
//clear any bits that are beyond the end of the array
int bitsToWipe = 32 - bitsRemaining;
cleanValue <<= bitsToWipe;
cleanValue >>= bitsToWipe;
}
hash = hash * 23 + cleanValue;
bitsRemaining -= 32;
}
return (int)hash;
}
The GetInternalValues extension method is implemented like this:
public static class BitArrayExtensions
{
static FieldInfo _internalArrayGetter = GetInternalArrayGetter();
static FieldInfo GetInternalArrayGetter()
{
return typeof(BitArray).GetField("m_array", BindingFlags.NonPublic | BindingFlags.Instance);
}
static int[] GetInternalArray(BitArray array)
{
return (int[])_internalArrayGetter.GetValue(array);
}
public static IEnumerable<int> GetInternalValues(this BitArray array)
{
return GetInternalArray(array);
}
... more extension methods
}
Any suggestions for improvement are welcome!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在字典中充当键是一个糟糕的类。实现 GetHashCode() 的唯一合理方法是使用其 CopyTo() 方法将位复制到 byte[] 中。这不太好,它会产生大量垃圾。
请求、窃取或借用 BitVector32 来代替。它有一个很好的 GetHashCode() 实现。如果您的数组超过 32 位,请考虑旋转您自己的类,这样您就可以访问底层数组而无需复制。
It is a terrible class to act as a key in a Dictionary. The only reasonable way to implement GetHashCode() is by using its CopyTo() method to copy the bits into a byte[]. That's not great, it creates a ton of garbage.
Beg, steal or borrow to use a BitVector32 instead. It has a good implementation for GetHashCode(). If you've got more than 32 bits then consider spinning your own class so you can get to the underlying array without having to copy.
如果位数组为 32 位或更短,则只需将它们转换为 32 位整数(如有必要,用零位填充)。
如果它们可以更长,那么您可以将它们转换为一系列 32 位整数并对它们进行异或,或者更好:使用有效 Java 中描述的算法。
取自此处。 field1、field2分别对应前32位、后32位等。
If the bit arrays are 32 bits or shorter then you just need to convert them to 32 bit integers (padding with zero bits if necessary).
If they can be longer then you can either convert them to a series of 32-bit integers and XOR them, or better: use the algorithm described in Effective Java.
Taken from here. The field1, field2 correcpond the the first 32 bits, second 32 bits, etc.