类和子类的通用哈希函数
我正在编写一种类框架,我需要获取对象的哈希值以将它们存储在哈希表中。
因此,如果我有:
class A {
int a;
};
class B : public A {
const char* str;
};
class C : public A {
double d;
otherClass* oc;
};
我需要能够通过哈希函数运行 B
或 C
来获取对象的哈希值。
我该怎么办?我想过简单地执行 sizeof(thing)
并对原始字节进行哈希处理,但这是一个好方法吗?我还考虑过在基类中使用virtual uint_32 hash() = 0
,但是如果必须为每个子类都实现这一点,那将是次优的。
I am writing a kind of class framework where I will need to get the hashes of objects for storing them in a hash table.
So if I have:
class A {
int a;
};
class B : public A {
const char* str;
};
class C : public A {
double d;
otherClass* oc;
};
I need to be able to run B
's or C
's through the hashing function to get the object's hash.
How should I go about this? I thought about simply doing sizeof(thing)
and hashing the raw bytes, but is that a good way to do it? I also thought of having virtual uint_32 hash() = 0
in the base class, but that would be suboptimal to have to implement that for every subclass.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
通常,您需要哈希函数与类中定义的相等性一致。也许相等是由重载的运算符 == 定义的,但即使没有重载,您也可能会认为两个对象应该被视为相等,并且具有相同的哈希码(如果它们的所有数据成员都相等)。
散列原始字节通常不起作用。不保证数据成员全部相等的两个对象将具有相等的字节。例如,出于对齐原因,对象中的某处可能有一些填充,并且填充字节可以采用任何值。
更糟糕的是,无法保证两个相等的 double 值具有相同的字节。例如,正/负零比较相等。
C 的情况特别棘手:如果两个 C 对象指向不同的 otherClass 对象,但两个 otherClass 对象相等,那么这两个 C 对象是否应该具有相同的哈希值价值?你不能完全笼统地定义它,它是 C 类的属性。
如果某件事也是可能的最好的,它会是“次优的”吗? ;-) 唯一通用的解决方案是定义一个函数
hash
,并为每个类编写一个版本。在您的情况下,您可以将其设为 A 的虚拟函数,但您也可以查看std::hash
在 C++0x 中的工作原理:它实际上是一个模板函子类而不是一个函数,并且它可以专门用于用户定义的类。当然,这不提供动态多态性,但如果您将其专门用于A
,并让实现调用一个虚拟函数,您在每个类中实现该函数,那么您的哈希函数将与 <代码>std::unordered_map 等Usually you need your hash function to be consistent with equality as defined on your classes. Maybe equality is defined by an overloaded
operator==
, but even if that's not overloaded you might think that two objects should be considered equal, and have the same hash code, if all their data members are equal.Hashing the raw bytes doesn't work in general. There is no guarantee that two objects whose data members are all equal will have equal bytes. For example, there might be some padding in the object somewhere, for alignment reasons, and padding bytes could take any value.
Even worse, there's no guarantee that two
double
values that are equal have equal bytes. For example, positive/negative zero compare equal.The case of C is particularly tricky: if two C objects point to different
otherClass
objects, but the twootherClass
objects are equal, then should the two C objects have the same hash value? You can't define that in complete generality, it's a property of the C class.Can something be "suboptimal" if it's also the best that's possible? ;-) The only general solution is to define a function
hash
, and write a version for every class. In your case you might make that a virtual function of A, but you could also look at howstd::hash
works in C++0x: it's actually a template functor class rather than a function, and it can be specialized for user-defined classes. That doesn't provide dynamic polymorphism, of course, but if you specialize it forA
, and have the implementation call a virtual function, which you implement in each class, then your hash function will work withstd::unordered_map
etc.如果对象具有未初始化的字段(对象具有不同的无意义位)或动态成员(对象具有不同的指针,即使它们指向相同的数据)。没有比编写序列化器然后通过哈希函数运行结果更好的了。
至于为每个基类实现
hash()
的成本,您有三种选择。我建议从 2 个开始,然后逐渐转向 3 个。
Doing the
sizeof
thing can give you different hashes of identical objects, if the objects have uninitialized fields (the objects have different meaningless bits) or dynamic members (the objects have pointers that are different, even though they point to identical data). You can't do much better than writing a serializer, then running the result through your hash function.As for the cost of implementing
hash()
for every base class, you have three choices.I'd recommend starting with 2, then moving to 3 piecemeal.