为什么按值参数被排除在 NRVO 之外？

发布于 2024-11-07 12:23:57 字数 476 浏览 10 评论 0原文

想象一下：

S f(S a) {
  return a;
}

为什么不允许给 a 和返回值槽起别名？

S s = f(t);
S s = t; // can't generally transform it to this :(

如果 S 的复制构造函数有副作用，则规范不允许进行此转换。相反，它至少需要两份副本（一份从 t 到 a，一份从 a 到返回值，另一份从返回值值到 s，并且只有最后一个可以被省略。请注意，我在上面写了 = t 来表示 t 的副本。到 f 的 a，在存在移动/复制构造函数的副作用的情况下仍然是强制的唯一副本）。

这是为什么？

原文

Imagine:

S f(S a) {
  return a;
}

Why is it not allowed to alias a and the return value slot?

S s = f(t);
S s = t; // can't generally transform it to this :(

The spec doesn't allow this transformation if the copy constructor of S has side effects. Instead, it requires at least two copies (one from t to a, and one from a to the return value, and another from the return value to s, and only that last one can be elided. Note that I wrote = t above to represent the fact of a copy of t to f's a, the only copy which would still be mandatory in the presence of side effects of move/copy constructor).

Why is that?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

清浅ˋ旧时光 2024-11-14 12:23:57

这就是为什么复制省略对于参数没有意义。这实际上是关于在编译器级别实现这个概念的。

复制省略本质上是通过就地构造返回值来工作的。该值不会被复制出来；它是直接在其预期目的地创建的。调用者为预期输出提供了空间，因此最终是调用者提供了省略的可能性。

为了消除副本，函数内部需要做的就是在调用者提供的位置构造输出。如果该函数可以做到这一点，您就会得到复制省略。如果函数不能，那么它将使用一个或多个临时变量来存储中间结果，然后将其复制/移动到调用者提供的位置。它仍然是就地构建的，但输出的构建是通过复制进行的。

因此，特定函数之外的世界不必知道或关心函数是否进行省略。具体来说，函数的调用者不必知道函数是如何实现的。它没有做任何不同的事情；函数本身决定是否可以省略。

值参数的存储也由调用者提供。当您调用 f(t) 时，调用者会创建 t 的副本并将其传递给 f。类似地，如果 S 可以从 int 隐式构造，则 f(5) 将从5 并将其传递给f。

这一切都是由调用者完成的。被调用者不知道也不关心它是一个变量还是一个临时变量；它只是给出了一个堆栈内存（或寄存器或其他）。

现在请记住：复制省略之所以有效，是因为被调用的函数将变量直接构造到输出位置。因此，如果您尝试忽略值参数的返回，则值参数的存储也必须是输出存储本身。但请记住：调用者为参数和输出提供存储。因此，要消除输出副本，调用者必须将参数直接构造到输出中。

为此，现在调用者需要知道它所调用的函数将忽略返回值，因为如果要返回参数，它只能将参数直接粘贴到输出中。这在编译器级别通常是不可能的，因为调用者不一定具有该函数的实现。如果函数是内联的，那么也许它可以工作。但除此之外没有。

因此，C++ 委员会并没有考虑到这种可能性。

Here's why copy elision doesn't make sense for parameters. It's really about the implementation of the concept at the compiler level.

Copy elision works by essentially constructing the return value in-place. The value isn't copied out; it's created directly in its intended destination. It's the caller who provides the space for the intended output, and thus it's ultimately the caller who provides the possibility for the elision.

All that the function internally needs to do in order to elide the copy is construct the output in the place provided by the caller. If the function can do this, you get copy elision. If the function can't, then it will use one or more temporary variables to store the intermediate results, then copy/move this into the place provided by the caller. It's still constructed in-place, but the construction of the output happens via copy.

So the world outside of a particular function doesn't have to know or care about whether a function does elision. Specifically, the caller of the function doesn't have to know about how the function is implemented. It's not doing anything different; it's the function itself that decides if elision is possible.

Storage for value parameters is also provided by the caller. When you call f(t), it is the caller that creates the copy of t and passes it to f. Similarly, if S is implicitly constructable from an int, then f(5) will construct an S from the 5 and pass it to f.

This is all done by the caller. The callee doesn't know or care that it was a variable or a temporary; it's just given a spot of stack memory (or registers or whatever).

Now remember: copy elision works because the function being called constructs the variable directly into the output location. So if you're trying to elide the return from a value parameter, then the storage for the value parameter must also be the output storage itself. But remember: it is the caller that provides that storage for both the parameter and the output. And therefore, to elide the output copy, the caller must construct the parameter directly into the output.

To do this, now the caller needs to know that the function it's calling will elide the return value, because it can only stick the parameter directly into the output if the parameter will be returned. That's not going to generally be possible at the compiler level, because the caller doesn't necessarily have the implementation of the function. If the function is inlined, then maybe it can work. But otherwise no.

Therefore, the C++ committee didn't bother to allow for the possibility.

回复收藏 0 原文

笔落惊风雨 2024-11-14 12:23:57

据我了解，该限制的基本原理是调用约定可能（并且在许多情况下）要求函数的参数和返回对象位于不同的位置（内存或寄存器）。考虑以下修改后的示例：

X foo();
X bar( X a ) 
{ 
   return a;
}
int main() {
   X x = bar( foo() );
}

理论上，整个副本集将是 foo ($tmp1) 中的 return 语句，参数 a of bar，main 中 bar ($tmp2) 和 x 的返回语句。编译器可以通过在 a 位置创建 $tmp1 和在 x< 位置创建 $tmp2 来消除四个对象中的两个对象。 /代码>。当编译器处理 main 时，它可以注意到 foo 的返回值是 bar 的参数，并且可以使它们一致，此时它不可能知道（没有内联）bar 的参数和返回是同一个对象，并且它必须遵守调用约定，因此它将放置 $tmp1在 bar 参数的位置。

同时，它知道 $tmp2 的目的只是创建 x，因此它可以将两者放在同一地址。在 bar 内部，没有太多可做的：根据调用约定，参数 a 位于第一个参数的位置，并且 $tmp2 必须根据调用约定进行定位，（一般情况下在不同的位置，认为该示例可以扩展到需要更多参数的 bar，其中只有一个参数现在，如果编译器执行内

联，它可以检测到如果函数未内联则所需的额外副本实际上并不需要，并且如果标准允许，则有机会删除它。对于要删除的特定副本，相同的代码将具有不同的行为，具体取决于函数是否内联。

The rationale, as I understand it, for that restriction is that the calling convention might (and will in many cases) demand that the argument to the function and the return object are at different locations (either memory or registers). Consider the following modified example:

X foo();
X bar( X a ) 
{ 
   return a;
}
int main() {
   X x = bar( foo() );
}

In theory the whole set of copies would be return statement in foo ($tmp1), argument a of bar, return statement of bar ($tmp2) and x in main. Compilers can elide two of the four objects by creating $tmp1 at the location of a and $tmp2 at the location of x. When the compiler is processing main it can note that the return value of foo is the argument to bar and can make them coincide, at that point it cannot possibly know (without inlining) that the argument and return of bar are the same object, and it has to comply with the calling convention, so it will place $tmp1 in the position of the argument to bar.

At the same time, it knows that the purpose of $tmp2 is only creating x, so it can place both at the same address. Inside bar, there is not much that can be done: the argument a is located in place of the first argument, according to the calling convention, and $tmp2 has to be located according to the calling convention, (in the general case in a different location, think that the example can be extended to a bar that takes more arguments, only one of which is used as return statement.

Now, if the compiler performs inlining it could detect that the extra copy that would be required if the function was not inlined is really not needed, and it would have a chance for eliding it. If the standard would allow for that particular copy to be elided, then the same code would have different behaviors depending on whether the function is inlined or not.

回复收藏 0 原文

帅气称霸 2024-11-14 12:23:57

David Rodríguez - dribea 回答我的问题 “如何允许 C++ 类的复制省略构造” 给了我以下想法。诀窍是使用 lambda 延迟计算直到函数体内：

#include <iostream>

struct S
{
  S() {}
  S(const S&) { std::cout << "Copy" << std::endl; }
  S(S&&) { std::cout << "Move" << std::endl; }
};

S f1(S a) {
  return a;
}

S f2(const S& a) {
  return a;
}

#define DELAY(x) [&]{ return x; }

template <class F>
S f3(const F& a) {
  return a();
}

int main()
{
  S t;
  std::cout << "Without delay:" << std::endl;
  S s1 = f1(t);
  std::cout << "With delay:" << std::endl;
  S s2 = f3(DELAY(t));
  std::cout << "Without delay pass by ref:" << std::endl;
  S s3 = f2(t);
  std::cout << "Without delay pass by ref (temporary) (should have 0 copies, will get 1):" << std::endl;
  S s4 = f2(S());
  std::cout << "With delay (temporary) (no copies, best):" << std::endl;
  S s5 = f3(DELAY(S()));
}

这在 ideone GCC 4.5.1 上输出：

毫不拖延：
复制
复制
延迟：
复制

这很好，但有人可能会建议 DELAY 版本就像通过 const 引用传递一样，如下所示：

无延迟地通过参考：
复制

如果我们通过 const 引用传递临时值，我们仍然会得到一个副本：

无延迟地通过ref（临时）（应该有0份，将得到1份）：
复制

延迟版本省略了副本：

延迟（临时）（最好没有副本）：

如您所见，这会消除临时情况下的所有副本。

延迟版本在非临时情况下生成一份副本，在临时情况下不生成副本。除了 lambda 之外，我不知道有什么方法可以实现这一目标，但如果有的话我会很感兴趣。

David Rodríguez - dribeas answer to my question 'How to allow copy elision construction for C++ classes' gave me the following idea. The trick is to use lambdas to delay evaluation til inside the function body:

#include <iostream>

struct S
{
  S() {}
  S(const S&) { std::cout << "Copy" << std::endl; }
  S(S&&) { std::cout << "Move" << std::endl; }
};

S f1(S a) {
  return a;
}

S f2(const S& a) {
  return a;
}

#define DELAY(x) [&]{ return x; }

template <class F>
S f3(const F& a) {
  return a();
}

int main()
{
  S t;
  std::cout << "Without delay:" << std::endl;
  S s1 = f1(t);
  std::cout << "With delay:" << std::endl;
  S s2 = f3(DELAY(t));
  std::cout << "Without delay pass by ref:" << std::endl;
  S s3 = f2(t);
  std::cout << "Without delay pass by ref (temporary) (should have 0 copies, will get 1):" << std::endl;
  S s4 = f2(S());
  std::cout << "With delay (temporary) (no copies, best):" << std::endl;
  S s5 = f3(DELAY(S()));
}

This outputs on ideone GCC 4.5.1:

Without delay:
Copy
Copy
With delay:
Copy

Now this is good, but one could suggest that the DELAY version is just like passing by const reference, as below:

Without delay pass by ref:
Copy

But if we pass a temporary by const reference, we still get a copy:

Without delay pass by ref (temporary) (should have 0 copies, will get 1):
Copy

Where the delayed version elides the copy:

With delay (temporary) (no copies, best):

As you can see, this elides all copies in the temporary case.

The delayed version produces one copy in the non-temporary case, and no copies in the case of a temporary. I don't know any way to achieve this other than lambdas, but I'd be interested if there is.

回复收藏 0 原文

冷心人i 2024-11-14 12:23:57

从 t 到 a 删除副本是不合理的。该参数被声明为可变的，因此需要进行复制，因为预计会在函数中对其进行修改。

从返回值我看不出任何复制的理由。也许这是某种疏忽？按值参数感觉就像函数体内的局部变量......我看不出有什么区别。

回复收藏 0 原文

玉环 2024-11-14 12:23:57

我觉得，因为替代方案始终可用于优化：

S& f(S& a) { return a; }  // pass & return by reference
^^^  ^^^

如果f()按照您的示例中提到的方式进行编码，那么完全可以假设复制是有意的或预期会产生副作用；否则为什么不选择通过/返回参考？

假设如果 NRVO 适用（如您所问），那么 S f(S) 和 S& 之间没有区别。 f(S&)！

NRVO 在诸如 operator +() (示例），因为没有有价值的替代方案。

一个支持方面，以下所有函数都有不同的复制行为：

S& f(S& a) { return a; }  // 0 copy
S f(S& a) { return a; } // 1 copy
S f(S a) { A a1; return (...)? a : a1; }  // 2 copies

在第三个片段中，如果在编译时已知 (...) 为 false，则编译器会生成只有 1 份。
这意味着，当有简单的替代方案可用时，编译器故意不执行优化。

I feel, because the alternative is always available for the optimization:

S& f(S& a) { return a; }  // pass & return by reference
^^^  ^^^

If f() is coded as mentioned in your example, then it's perfectly alright to assume that copy is intended or side effects are expected; otherwise why not to choose the pass/return by reference ?

Suppose if NRVO applies (as you ask) then there is no difference between S f(S) and S& f(S&)!

NRVO kicks in the situations like operator +() (example) because there is no worthy alternative.

One supporting aspect, all below function have different behaviors for copying:

S& f(S& a) { return a; }  // 0 copy
S f(S& a) { return a; } // 1 copy
S f(S a) { A a1; return (...)? a : a1; }  // 2 copies

In the 3rd snippet, if the (...) is known at compile time to be false then compiler generates only 1 copy.
This means, that compiler purposefully doesn't perform optimization when a trivial alternative is available.

回复收藏 0 原文