如何达到“最佳”效果？带有右值的算术表达式中的运算符重载解析？

发布于 2024-10-19 12:27:06 字数 4464 浏览 3 评论 0原文

首先，我为这个过于冗长的问题表示歉意。我想不出任何其他方法来准确总结我的问题...现在讨论实际问题：

我目前正在试验 C++0x 右值引用...以下代码会产生不需要的行为：

#include <iostream>
#include <utility>

struct Vector4
{
    float x, y, z, w;

    inline Vector4 operator + (const Vector4& other) const
    {
        Vector4 r;
        std::cout << "constructing new temporary to store result"
                  << std::endl;
        r.x = x + other.x;
        r.y = y + other.y;
        r.z = z + other.z;
        r.w = w + other.w;
        return r;
    }
    Vector4&& operator + (Vector4&& other) const
    {
        std::cout << "reusing temporary 2nd operand to store result"
                  << std::endl;
        other.x += x;
        other.y += y;
        other.z += z;
        other.w += w;
        return std::move(other);
    }
    friend inline Vector4&& operator + (Vector4&& v1, const Vector4& v2)
    {
        std::cout << "reusing temporary 1st operand to store result"
                  << std::endl;
        v1.x += v2.x;
        v1.y += v2.y;
        v1.z += v2.z;
        v1.w += v2.w;
        return std::move(v1);
    }
};

int main (void)
{
    Vector4 r,
            v1 = {1.0f, 1.0f, 1.0f, 1.0f},
            v2 = {2.0f, 2.0f, 2.0f, 2.0f},
            v3 = {3.0f, 3.0f, 3.0f, 3.0f},
            v4 = {4.0f, 4.0f, 4.0f, 4.0f},
            v5 = {5.0f, 5.0f, 5.0f, 5.0f};

    ///////////////////////////
    // RELEVANT LINE HERE!!! //
    ///////////////////////////
    r = v1 + v2 + (v3 + v4) + v5;

    return 0;
}

输出中的结果

构造新的临时对象来存储结果
构造新的临时来存储结果
重用临时的第一个操作数来存储结果
重用临时的第一个操作数来存储结果

而我希望得到类似的结果

构造新的临时对象来存储结果
重用临时的第一个操作数来存储结果
重用临时第二个操作数来存储结果
重用临时第二个操作数来存储结果

在尝试重新执行编译器正在执行的操作之后（我使用 MinGW G++ 4.5.2 和选项 -std=c++0x 以防万一），它实际上看起来很合乎逻辑。该标准规定，同等优先级的算术运算是从左到右评估/分组的（为什么我假设从右到左我不知道，我想这对我来说更直观）。所以这里发生的事情是，编译器首先计算子表达式 (v3 + v4) （因为它在括号中？），然后开始从左到右匹配表达式中的操作运算符重载，导致对子表达式 v1 + v2 调用 Vector4 operator + (const Vector4& other)。如果我想避免不必要的临时，我必须确保不超过一个左值操作数出现在任何带括号的子表达式的左边，这对于任何使用这个“库”并天真地期望的人来说是违反直觉的最佳性能（如最大限度地减少临时创建）。

（我知道我的代码中关于 operator + (Vector4&& v1, const Vector4& v2) 和 operator + (Vector4&& other)当将 (v3 + v4) 添加到 v1 + v2 的结果中时，会产生警告，但这对我来说是无害的，我不想这样做。为两个右值引用操作数添加另一个重载 - 有人知道是否有办法在 gcc 中禁用此警告？）

长话短说，我的问题归结为：这个向量类有没有任何方法或模式（最好是独立于编译器的）被重写以允许在表达式中任意使用括号，仍然导致运算符重载的“最佳”选择（在“性能”方面最佳，即最大化与右值引用的绑定）？也许我的要求太多了，这是不可能的……如果是这样，那也没关系。我只是想确保我没有遗漏任何东西。

预先非常感谢

附录

首先感谢我在几分钟内得到的快速回复（！） - 我真的应该早点开始在这里发帖...

在评论中回复变得很乏味，所以我想澄清我对这个类设计的意图一切正常。也许你可以指出我思维过程中的一个基本概念缺陷（如果有的话）。

您可能会注意到，我在类中不持有任何资源，例如堆内存。它的成员甚至只是标量类型。乍一看，这使其成为基于移动语义的优化的可疑候选者（另请参阅这个问题实际上帮助我很好地掌握了右值引用背后的概念）。

然而，由于这个类应该是一个原型，将在性能关键的环境中使用（准确地说是 3D 引擎），所以我想优化每一个可能的细节。低复杂度算法和数学相关技术（例如查找表）当然应该构成优化的大部分，因为其他任何方法都只是解决症状，而不是消除性能不佳的真正原因。我很清楚这一点。

排除了这一点，我的目的是使用向量和矩阵来优化代数表达式，这些向量和矩阵本质上是普通的旧数据结构，没有指向其中数据的指针（主要是由于您得到的性能缺陷）堆上的数据[必须取消引用附加指针、缓存注意事项等]）。

我不关心移动分配或构造，我只是不希望在复杂代数表达式的求值过程中创建比绝对必要的更多的临时变量（通常只有一两个，例如矩阵和向量）。

这些是我的想法，可能是错误的。如果是，请纠正我：

要在不依赖 RVO 的情况下实现此目的，需要按引用返回（再次记住：我没有远程资源，只有标量数据成员）。
通过引用返回使函数调用表达式成为左值，这意味着返回的对象不是临时对象，这是不好的，但通过右值引用返回使函数调用表达式成为xvalue（参见3.10.1），这在我的方法的上下文（参见 4）
通过引用返回是危险的，因为对象的生命周期可能很短，但是：
临时对象保证存活到创建它们的表达式的评估结束为止，因此：
使其安全如果此右值引用参数引用的对象是通过引用返回的对象，则从至少采用一个右值引用作为其参数的运算符按引用返回。因此：
当涉及不超过一个 PoD 类类型时，任何仅使用二元运算符的任意表达式都可以通过仅创建一个临时值来求值，并且二元运算本质上不需要临时值（如矩阵乘法）

（另一个原因通过右值引用返回是因为就函数调用表达式的右值而言，它的行为类似于按值返回，并且运算符/函数调用表达式必须是右值；为了绑定到对采用右值引用的运算符的后续调用，对按引用返回的函数的调用是左值，因此将绑定到具有签名 T operator+(const T&, const) 的运算符。 T&)，导致创建不必要的临时文件）

我可以通过使用诸如 add(Vector4 *result, Vector4 *v1, Vector4 *v2) 等函数的 C 风格方法来实现所需的性能，但是拜托，我们生活在 21 世纪...

总之，我的目标是创建一个向量类，它可以使用重载运算符实现与 C 方法相同的性能。如果这本身是不可能的，那么我想也是没有办法的。但如果有人能向我解释为什么我的方法注定会失败（当然，从左到右的操作员评估问题是我发表这篇文章的最初原因），我将不胜感激。
事实上，我一直在使用“真正的”向量类，这是它的简化版本有一段时间了，到目前为止没有任何崩溃或损坏的内存。事实上，我从来没有真正返回本地对象作为引用，所以不应该有任何问题。我敢说我所做的事情是符合标准的。

对于原始问题的任何帮助当然也将不胜感激！

再次感谢大家的耐心

原文

first of all, I apologize for the overly verbose question. I couldn't think of any other way to accurately summarize my problem... Now on to the actual question:

I'm currently experimenting with C++0x rvalue references... The following code produces unwanted behavior:

#include <iostream>
#include <utility>

struct Vector4
{
    float x, y, z, w;

    inline Vector4 operator + (const Vector4& other) const
    {
        Vector4 r;
        std::cout << "constructing new temporary to store result"
                  << std::endl;
        r.x = x + other.x;
        r.y = y + other.y;
        r.z = z + other.z;
        r.w = w + other.w;
        return r;
    }
    Vector4&& operator + (Vector4&& other) const
    {
        std::cout << "reusing temporary 2nd operand to store result"
                  << std::endl;
        other.x += x;
        other.y += y;
        other.z += z;
        other.w += w;
        return std::move(other);
    }
    friend inline Vector4&& operator + (Vector4&& v1, const Vector4& v2)
    {
        std::cout << "reusing temporary 1st operand to store result"
                  << std::endl;
        v1.x += v2.x;
        v1.y += v2.y;
        v1.z += v2.z;
        v1.w += v2.w;
        return std::move(v1);
    }
};

int main (void)
{
    Vector4 r,
            v1 = {1.0f, 1.0f, 1.0f, 1.0f},
            v2 = {2.0f, 2.0f, 2.0f, 2.0f},
            v3 = {3.0f, 3.0f, 3.0f, 3.0f},
            v4 = {4.0f, 4.0f, 4.0f, 4.0f},
            v5 = {5.0f, 5.0f, 5.0f, 5.0f};

    ///////////////////////////
    // RELEVANT LINE HERE!!! //
    ///////////////////////////
    r = v1 + v2 + (v3 + v4) + v5;

    return 0;
}

results in the output

constructing new temporary to store result
constructing new temporary to store result
reusing temporary 1st operand to store result
reusing temporary 1st operand to store result

while I had hoped for something like

constructing new temporary to store result
reusing temporary 1st operand to store result
reusing temporary 2nd operand to store result
reusing temporary 2nd operand to store result

After trying to re-enact what the compiler was doing (I'm using MinGW G++ 4.5.2 with option -std=c++0x in case it matters), it actually seems quite logical. The standard says that arithmetic operations of equal precedence are evaluated/grouped left-to-right (why I assumed right-to-left I don't know, I guess it's more intuitive to me). So what happened here is that the compiler evaluated the sub-expression (v3 + v4) first (since it's in parentheses?), and then began matching the operations in the expression left-to-right against the operator overloads, resulting in a call to Vector4 operator + (const Vector4& other) for the sub-expression v1 + v2. If I want to avoid the unnecessary temporary, I'd have to make sure that no more than one lvalue operand appears to the immediate left of any parenthesized sub-expression, which is counter-intuitive to anyone using this "library" and innocently expecting optimal performance (as in minimizing the creation of temporaries).

(I'm aware that there's ambiguity in my code regarding operator + (Vector4&& v1, const Vector4& v2) and operator + (Vector4&& other) when (v3 + v4) is to be added to the result of v1 + v2, resulting in a warning. But it's harmless in my case and I don't want to add yet another overload for two rvalue reference operands - anyone know if there's a way to disable this warning in gcc?)

Long story short, my question boils down to: Is there any way or pattern (preferably compiler-independent) this vector class could be rewritten to enable arbitrary use of parentheses in expressions that still results in the "optimal" choice of operator overloads (optimal in terms of "performance", i.e. maximizing the binding to rvalue references)? Perhaps I'm asking for too much though and it's impossible... if so, then that's fine too. I just want to make sure I'm not missing anything.

Many thanks in advance

Addendum

First thanks to the quick responses I got, within minutes (!) - I really should have started posting here sooner...

It's becoming tedious replying in the comments, so I think a clarification of my intent with this class design is in order. Maybe you can point me to a fundamental conceptual flaw in my thought process if there is one.

You may notice that I don't hold any resources in the class like heap memory. Its members are only scalar types even. At first sight this makes it a suspect candidate for move-semantics based optimizations (see also this question that actually helped me a great deal grasping the concepts behind rvalue references).

However, since the classes this one is supposed to be a prototype for will be used in a performance-critical context (a 3D engine to be precise), I want to optimize every little thing possible. Low-complexity algorithms and maths-related techniques like look-up tables should of course make up the bulk of the optimizations as anything else would simply be addressing the symptoms and not eradicating the real reason for bad performance. I am well aware of that.

With that out of the way, my intent here is to optimize algebraic expressions with vectors and matrices that are essentially plain-old-data structs without pointers to data in them (mainly due to the performance drawbacks you get with data on the heap [having to dereference additional pointers, cache considerations etc.]).

I don't care about move-assignment or construction, I just don't want more temporaries being created during the evaluation of a complicated algebraic expression than absolutely necessary (usually just one or two, e.g. a matrix and a vector).

Those are my thoughts that might be erroneous. If they are, please correct me:

To achieve this without relying on RVO, return-by-reference is necessary (again: keep in mind I don't have remote resources, only scalar data members).
Returning by reference makes the function-call expression an lvalue, implying the returned object is not a temporary, which is bad, but returning by rvalue reference makes the function-call expression an xvalue (see 3.10.1), which is okay in the context of my approach (see 4)
Returning by reference is dangerous, because of the possibly short lifetime of objects, but:
temporaries are guaranteed to live until the end of the evaluation of the expression they were created in, therefore:
making it safe to return by reference from those operators that take at least one rvalue-reference as their argument, if the object referenced by this rvalue reference argument is the one being returned by reference. Therefore:
Any arbitrary expression that only employs binary operators can be evaluated by creating only one temporary when not more than one PoD-like type is involved, and the binary operations don't require a temporary by nature (like matrix multiplication)

(Another reason to return by rvalue-reference is because it behaves like returning by value in terms of rvalue-ness of the function-call expression; and it's required for the operator/function-call expression to be an rvalue in order to bind to subsequent calls to operators that take rvalue references. As stated in (2), calls to functions that return by reference are lvalues, and would therefore bind to operators with the signature T operator+(const T&, const T&), resulting in the creation of an unnecessary temporary)

I could achieve the desired performance by using a C-style approach of functions like add(Vector4 *result, Vector4 *v1, Vector4 *v2), but come on, we're living in the 21st century...

In summary, my goal is creating a vector class that achieves the same performance as the C-approach using overloaded operators. If that in itself is impossible, than I guess it can't be helped. But I'd appreciate if someone could explain to me why my approach is doomed to fail (the left-to-right operator evaluation issue that was the initial reason for my post aside, of course).
As a matter of fact, I've been using the "real" vector class this one is a simplification of for a while without any crashes or corrupted memory so far. And in fact, I never actually return local objects as references, so there shouldn't be any problems. I dare say what I'm doing is standard-compliant.

Any help on the original issue would of course be appreciated as well!

many thanks for all the patience again

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笑咖 2024-10-26 12:27:06

您不应该返回右值引用，而应该返回一个值。此外，您不应同时指定成员和自由运算符+。我很惊讶甚至编译了。

编辑：

r = v1 + v2 + (v3 + v4) + v5;

当您执行两项子计算时，怎么可能可能只有一个临时值？那是不可能的。您无法重写标准并更改它。

您只需要相信您的用户会做一些不完全愚蠢的事情，例如编写上面的代码行，并期望只有一个临时代码。

You should not return an rvalue reference, you should return a value. In addition, you should not specify both a member and a free operator+. I'm amazed that even compiled.

Edit:

r = v1 + v2 + (v3 + v4) + v5;

How could you possibly only have one temporary value when you're performing two sub-computations? That's just impossible. You can't re-write the Standard and change this.

You will just have to trust your users to do something not completely stupid, like write the above line of code, and expect to have just one temporary.

回复收藏 0 原文