类和子类的通用哈希函数

发布于 2024-10-04 05:36:15 字数 424 浏览 0 评论 0原文

我正在编写一种类框架,我需要获取对象的哈希值以将它们存储在哈希表中。
因此,如果我有:

class A {
    int a;
};

class B : public A {
    const char* str;
};

class C : public A {
    double d;
    otherClass* oc;
};

我需要能够通过哈希函数运行 BC 来获取对象的哈希值。

我该怎么办?我想过简单地执行 sizeof(thing) 并对原始字节进行哈希处理,但这是一个好方法吗?我还考虑过在基类中使用virtual uint_32 hash() = 0,但是如果必须为每个子类都实现这一点,那将是次优的。

I am writing a kind of class framework where I will need to get the hashes of objects for storing them in a hash table.
So if I have:

class A {
    int a;
};

class B : public A {
    const char* str;
};

class C : public A {
    double d;
    otherClass* oc;
};

I need to be able to run B's or C's through the hashing function to get the object's hash.

How should I go about this? I thought about simply doing sizeof(thing) and hashing the raw bytes, but is that a good way to do it? I also thought of having virtual uint_32 hash() = 0 in the base class, but that would be suboptimal to have to implement that for every subclass.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

谜兔 2024-10-11 05:36:15

通常,您需要哈希函数与类中定义的相等性一致。也许相等是由重载的运算符 == 定义的,但即使没有重载,您也可能会认为两个对象应该被视为相等,并且具有相同的哈希码(如果它们的所有数据成员都相等)。

散列原始字节通常不起作用。不保证数据成员全部相等的两个对象将具有相等的字节。例如,出于对齐原因,对象中的某处可能有一些填充,并且填充字节可以采用任何值。

更糟糕的是,无法保证两个相等的 double 值具有相同的字节。例如,正/负零比较相等。

C 的情况特别棘手:如果两个 C 对象指向不同的 otherClass 对象,但两个 otherClass 对象相等,那么这两个 C 对象是否应该具有相同的哈希值价值?你不能完全笼统地定义它,它是 C 类的属性。

如果某件事也是可能的最好的,它会是“次优的”吗? ;-) 唯一通用的解决方案是定义一个函数hash,并为每个类编写一个版本。在您的情况下,您可以将其设为 A 的虚拟函数,但您也可以查看 std::hash 在 C++0x 中的工作原理:它实际上是一个模板函子类而不是一个函数,并且它可以专门用于用户定义的类。当然,这不提供动态多态性,但如果您将其专门用于 A,并让实现调用一个虚拟函数,您在每个类中实现该函数,那么您的哈希函数将与 <代码>std::unordered_map 等

Usually you need your hash function to be consistent with equality as defined on your classes. Maybe equality is defined by an overloaded operator==, but even if that's not overloaded you might think that two objects should be considered equal, and have the same hash code, if all their data members are equal.

Hashing the raw bytes doesn't work in general. There is no guarantee that two objects whose data members are all equal will have equal bytes. For example, there might be some padding in the object somewhere, for alignment reasons, and padding bytes could take any value.

Even worse, there's no guarantee that two double values that are equal have equal bytes. For example, positive/negative zero compare equal.

The case of C is particularly tricky: if two C objects point to different otherClass objects, but the two otherClass objects are equal, then should the two C objects have the same hash value? You can't define that in complete generality, it's a property of the C class.

Can something be "suboptimal" if it's also the best that's possible? ;-) The only general solution is to define a function hash, and write a version for every class. In your case you might make that a virtual function of A, but you could also look at how std::hash works in C++0x: it's actually a template functor class rather than a function, and it can be specialized for user-defined classes. That doesn't provide dynamic polymorphism, of course, but if you specialize it for A, and have the implementation call a virtual function, which you implement in each class, then your hash function will work with std::unordered_map etc.

苦妄 2024-10-11 05:36:15

如果对象具有未初始化的字段(对象具有不同的无意义位)或动态成员(对象具有不同的指针,即使它们指向相同的数据)。没有比编写序列化器然后通过哈希函数运行结果更好的了。

至于为每个基类实现 hash() 的成本,您有三种选择。

  1. 为基类实现“hash()”,但不要在所有派生类中重载它。派生类的某些对象将具有相同的哈希值,即使它们不同。
  2. 使其成为纯虚拟(`virtual uint32 hash()=0`),在所有派生类中实现它,即使它有时很微不足道(`hash() {return(0)}`)。与1相同的问题,但问题更容易看出。
  3. 咬紧牙关。为所有子类正确实现它。

我建议从 2 个开始,然后逐渐转向 3 个。

Doing the sizeof thing can give you different hashes of identical objects, if the objects have uninitialized fields (the objects have different meaningless bits) or dynamic members (the objects have pointers that are different, even though they point to identical data). You can't do much better than writing a serializer, then running the result through your hash function.

As for the cost of implementing hash() for every base class, you have three choices.

  1. Implement 'hash()` for the base class, but don't overload it in all derived classes. Some objects of derived classes will have the same hash, even though they're different.
  2. Make it a pure virtual(`virtual uint32 hash()=0`), implement it in all derived classes, even if it's sometimes trivial (`hash() {return(0)}`). Same problem as 1, but the problem is easier to see.
  3. Bite the bullet. Implement it correctly for all subclasses.

I'd recommend starting with 2, then moving to 3 piecemeal.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文