实现operator==和operator<是否安全?使用 std::memcmp?

发布于 2024-09-16 21:00:01 字数 754 浏览 6 评论 0 原文

看到这个问题后,我的第一个想法是'定义泛型等价和关系运算符是微不足道的:

#include <cstring>

template<class T>
bool operator==(const T& a, const T& b) {

    return std::memcmp(&a, &b, sizeof(T)) == 0;

}

template<class T>
bool operator<(const T& a, const T& b) {

    return std::memcmp(&a, &b, sizeof(T)) < 0;

}

使用命名空间 std::rel_ops 会变得更加有用,因为它会通过运算符 ==< 的默认实现变得完全通用/code> 和 <。显然,这不会执行按成员比较,而是按位比较,就好像该类型仅包含 POD 成员一样。这与 C++ 生成复制构造函数的方式并不完全一致,例如,复制构造函数执行成员复制。

但我想知道上面的实现是否确实安全。这些结构自然会具有相同的包装、相同的类型,但是填充的内容是否保证相同(例如,用零填充)?有什么原因或情况会导致此方法不起作用吗?

After seeing this question, my first thought was that it'd be trivial to define generic equivalence and relational operators:

#include <cstring>

template<class T>
bool operator==(const T& a, const T& b) {

    return std::memcmp(&a, &b, sizeof(T)) == 0;

}

template<class T>
bool operator<(const T& a, const T& b) {

    return std::memcmp(&a, &b, sizeof(T)) < 0;

}

using namespace std::rel_ops would then become even more useful, since it would be made fully generic by the default implementations of operators == and <. Obviously this does not perform a memberwise comparison, but instead a bitwise one, as though the type contains only POD members. This is not entirely consistent with how C++ generates copy constructors, for instance, which do perform memberwise copying.

But I wonder whether the above implementation is indeed safe. The structures would naturally have the same packing, being of the same type, but are the contents of the padding guaranteed to be identical (e.g., filled with zeros)? Are there any reasons why or situations in which this wouldn't work?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

故事和酒 2024-09-23 21:00:01

不——举例来说,如果您有 T==(float | double | long double),您的 operator== 就无法正常工作。两个 NaN 永远不应该比较相等,即使它们具有相同的位模式(事实上,检测 NaN 的一种常见方法是将数字与其自身进行比较 - 如果它不等于自身,则它是 NaN)。同样,指数中所有位均设置为 0 的两个浮点数的值均为 0.0(准确),无论尾数中的哪些位可能被设置/清除。

您的 operator< 正常工作的机会就更小了。例如,考虑一下 std::string 的典型实现,如下所示:

template <class charT>
class string { 
    charT *data;
    size_t length;
    size_t buffer_size;
public:
    // ...
};

通过这种成员排序,您的 operator< 将根据地址进行比较字符串碰巧存储数据的缓冲区。例如,如果它碰巧首先使用 length 成员编写,则您的比较将使用字符串的长度作为主键。无论如何,它都不会根据实际字符串内容进行比较,因为它只会查看 data 指针的值,而不是它指向的任何内容at,这才是你真正想要/需要的。

编辑:就填充而言,不要求填充的内容相等。从理论上讲,填充也可能是某种陷阱表示,如果您尝试查看它,它会导致信号、抛出异常或类似的情况。为了避免这种陷阱表示,您需要使用类似于强制转换的方法将其视为 unsigned char 的缓冲区。 memcmp 可能会这样做,但话又说回来,它可能不会......

还要注意,相同类型的对象并不一定意味着使用相同的成员对齐方式。这是一种常见的实现方法,但编译器也完全有可能执行一些操作,例如根据它“认为”特定对象将被使用的频率来使用不同的对齐方式,并在中包含某种标记 告诉这个特定实例的对齐方式的对象(例如,写入第一个填充字节的值)。同样,它可以通过(例如)地址分隔对象,因此位于偶数地址的对象具有 2 字节对齐,位于四的倍数的地址具有 4 字节对齐,依此类推(这不能是用于 POD 类型,但除此之外,一切皆废)。

这两种情况都不太可能或常见,但我也想不出标准中有任何禁止它们的内容。

No -- just for example, if you have T==(float | double | long double), your operator== doesn't work right. Two NaNs should never compare as equal, even if they have the identical bit pattern (in fact, one common method of detecting a NaN is to compare the number to itself -- if it's not equal to itself, it's a NaN). Likewise, two floating point numbers with all the bits in their exponents set to 0 have the value 0.0 (exactly) regardless of what bits might be set/clear in the significand.

Your operator< has even less chance of working correctly. For example, consider a typical implementation of std::string that looks something like this:

template <class charT>
class string { 
    charT *data;
    size_t length;
    size_t buffer_size;
public:
    // ...
};

With this ordering of the members, your operator< will do its comparison based on the addresses of the buffers where the strings happen to have stored their data. If, for example, it happened to have been written with the length member first, your comparison would use the lengths of the strings as the primary keys. In any case, it won't do a comparison based on the actual string contents, because it will only ever look at the value of the data pointer, not whatever it points at, which is what you really want/need.

Edit: As far as padding goes, there's no requirement that the contents of padding be equal. It's also theoretically possible for padding to be some sort of trap representation that will cause a signal, throw an exception, or something on that order, if you even try to look at it at all. To avoid such trap representations, you need to use something like a cast to look at it as a buffer of unsigned chars. memcmp might do that, but then again it might not...

Also note that being the same types of objects does not necessarily mean the use the same alignment of members. That's a common method of implementation, but it's also entirely possible for a compiler to do something like using different alignments based on how often it "thinks" a particular object will be used, and include a tag of some sort in the object (e.g., a value written into the first padding byte) that tells the alignment for this particular instance. Likewise, it could segregate objects by (for example) address, so an object located at an even address has 2-byte alignment, at an address that's a multiple of four has 4-byte alignment, and so on (this can't be used for POD types, but otherwise, all bets are off).

Neither of these is likely or common, but offhand I can't think of anything in the standard that prohibits them either.

如若梦似彩虹 2024-09-23 21:00:01

除非您 100% 确定内存布局、编译器行为,并且您真的不关心可移植性,并且您确实希望获得效率,否则永远不要这样做

来源

Never do this unless you're 100% sure about the memory layout, compiler behavior, and you really don't care portability, and you really want to gain the efficiency

SOURCE

哆啦不做梦 2024-09-23 21:00:01

即使对于 POD,== 运算符也可能是错误的。这是由于如下结构的对齐导致的,该结构在我的编译器上占用 8 个字节。

class Foo {
  char foo; /// three bytes between foo and bar
  int bar;
};

Even for POD, == operator can be wrong. This is due to alignment of structures like the following one which takes 8 bytes on my compiler.

class Foo {
  char foo; /// three bytes between foo and bar
  int bar;
};
缪败 2024-09-23 21:00:01

这是非常危险的,因为编译器不仅会为普通的旧结构使用这些定义,还会为任何您忘记定义 ==< 正确地。

总有一天,它会咬你。

That's highly dangerous because the compiler will use these definitions not only for plain old structs, but also for any classes, however complex, for which you forgot to define == and < properly.

One day, it will bite you.

深爱成瘾 2024-09-23 21:00:01

很大程度上取决于您对等效性的定义。

例如,如果您在类中比较的任何成员都是浮点数。

上述实现可能会将两个双精度数视为不相等,即使它们来自具有相同输入的相同数学计算 - 因为它们可能不会生成完全相同的输出 - 而是两个非常相似的数字。

通常,这些数字应在数值上与适当的容差进行比较。

A lot can depend on your definition of equivalence.

e.g. if any of the members that you are comparing within your classes are floating point numbers.

The above implementation may treat two doubles as not equal even though they came from the same mathematical calculation with the same inputs - as they may not have generated exactly the same output - rather two very similar numbers.

Typically such numbers should be compared numerically with an appropriate tolerance.

歌枕肩 2024-09-23 21:00:01

任何包含单个指针的结构或类将立即导致任何有意义的比较失败。这些运算符仅适用于任何普通旧数据(POD)类。另一位回答者正确地指出浮点是一种情况,即使这样也不成立,并填充字节。

简短的回答:如果这是一个聪明的想法,那么语言就会像默认的复制构造函数/赋值运算符一样拥有它。

Any struct or class containing a single pointer will instantly fail any sort of meaningful comparison. Those operators will ONLY work for any class that is Plain Old Data, or POD. Another answerer correctly pointed out floating points as a case when even that won't hold true, and padding bytes.

Short answer: If this was a smart idea, the language would have it like default copy constructors/assignment operators.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文