在 C++ 中使用增量运算符是否合法？函数调用？

发布于 2024-07-14 04:12:27 字数 754 浏览 5 评论 0原文

这个问题关于以下代码是否是合法的C++：

std::list<item*>::iterator i = items.begin();
while (i != items.end())
{
    bool isActive = (*i)->update();
    if (!isActive)
    {
        items.erase(i++);  // *** Is this undefined behavior? ***
    }
    else
    {
        other_code_involving(*i);
        ++i;
    }
}

这里的问题是erase()将使有问题的迭代器无效。如果这种情况发生在 i++ 计算之前，那么像这样递增 i 在技术上是未定义的行为，即使它看起来适用于特定的编译器。争论的一方认为，在调用函数之前，所有函数参数都会被完全评估。另一方说，“唯一的保证是 i++ 将在下一条语句之前和使用 i++ 之后发生。无论是在调用擦除（i++）之前还是之后，都取决于编译器。”

我提出这个问题是为了希望解决这场争论。

原文

There's been some debate going on in this question about whether the following code is legal C++:

std::list<item*>::iterator i = items.begin();
while (i != items.end())
{
    bool isActive = (*i)->update();
    if (!isActive)
    {
        items.erase(i++);  // *** Is this undefined behavior? ***
    }
    else
    {
        other_code_involving(*i);
        ++i;
    }
}

The problem here is that erase() will invalidate the iterator in question. If that happens before i++ is evaluated, then incrementing i like that is technically undefined behavior, even if it appears to work with a particular compiler. One side of the debate says that all function arguments are fully evaluated before the function is called. The other side says, "the only guarantees are that i++ will happen before the next statement and after i++ is used. Whether that is before erase(i++) is invoked or afterwards is compiler dependent."

I opened this question to hopefully settle that debate.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千仐 2024-07-21 04:12:27

引用 C++ 标准 1.9.16 :

当调用函数时（无论是
不是内联函数），每个
价值计算和副作用
与任何参数相关联
表达式，或带有后缀
指定被叫方的表达式
函数，先排序
执行每个表达式或
被调用主体中的声明
功能。（注：数值计算
以及与之相关的副作用
不同的参数表达式是
未排序。）

所以在我看来这段代码：

foo(i++);

是完全合法的。它将递增 i，然后使用 i 的先前值调用 foo。然而，这段代码：

foo(i++, i++);

产生未定义的行为，因为第 1.9.16 段还说：

如果标量对象的副作用是
相对于另一个没有顺序
对同一标量对象的副作用
或使用值进行值计算
对于同一个标量对象，
行为未定义。

Quoth the C++ standard 1.9.16:

When calling a function (whether or
not the function is inline), every
value computation and side effect
associated with any argument
expression, or with the postfix
expression designating the called
function, is sequenced before
execution of every expression or
statement in the body of the called
function. (Note: Value computations
and side effects associated with the
different argument expressions are
unsequenced.)

So it would seem to me that this code:

foo(i++);

is perfectly legal. It will increment i and then call foo with the previous value of i. However, this code:

foo(i++, i++);

yields undefined behavior because paragraph 1.9.16 also says:

If a side effect on a scalar object is
unsequenced relative to either another
side effect on the same scalar object
or a value computation using the value
of the same scalar object, the
behavior is undefined.

回复收藏 0 原文

最笨的告白 2024-07-21 04:12:27

以 Kristo 为基础回答，

foo(i++, i++);

产生未定义的行为，因为函数参数的求值顺序是未定义的（在更一般的情况下，因为如果您在表达式中读取一个变量两次，并且在其中写入它，则结果是未定义的）。您不知道哪个参数将首先递增。

int i = 1;
foo(i++, i++);

可能会导致函数调用

foo(2, 1);

或

foo(1, 2);

甚至

foo(1, 1);

运行以下命令来查看您的平台上发生的情况：

#include <iostream>

using namespace std;

void foo(int a, int b)
{
    cout << "a: " << a << endl;
    cout << "b: " << b << endl;
}

int main()
{
    int i = 1;
    foo(i++, i++);
}

在我的机器上我

$ ./a.out
a: 2
b: 1

每次都会得到，但此代码不可移植，因此我希望看到不同的结果不同编译器的结果。

To build on Kristo's answer,

foo(i++, i++);

yields undefined behavior because the order that function arguments are evaluated is undefined (and in the more general case because if you read a variable twice in an expression where you also write it, the result is undefined). You don't know which argument will be incremented first.

int i = 1;
foo(i++, i++);

might result in a function call of

foo(2, 1);

foo(1, 2);

or even

foo(1, 1);

Run the following to see what happens on your platform:

#include <iostream>

using namespace std;

void foo(int a, int b)
{
    cout << "a: " << a << endl;
    cout << "b: " << b << endl;
}

int main()
{
    int i = 1;
    foo(i++, i++);
}

On my machine I get

$ ./a.out
a: 2
b: 1

every time, but this code is not portable, so I would expect to see different results with different compilers.

回复收藏 0 原文

乖乖 2024-07-21 04:12:27

标准说副作用发生在调用之前，因此代码与：

std::list<item*>::iterator i_before = i;

i = i_before + 1;

items.erase(i_before);

而不是：

std::list<item*>::iterator i_before = i;

items.erase(i);

i = i_before + 1;

因此在这种情况下是安全的，因为 list.erase() 特别不会使除擦除的迭代器之外的任何迭代器无效。

也就是说，这是一种不好的风格 - 所有容器的擦除函数都会专门返回下一个迭代器，因此您不必担心由于重新分配而使迭代器无效，因此惯用代码：

i = items.erase(i);

对于列表来说是安全的，对于如果您想更改存储，则可以使用向量、双端队列和任何其他序列容器。

您也不会在没有警告的情况下编译原始代码 - 您必须编写

(void)items.erase(i++);

以避免有关未使用的返回的警告，这将是您正在做一些奇怪的事情的重要线索。

The standard says the side effect happens before the call, so the code is the same as:

std::list<item*>::iterator i_before = i;

i = i_before + 1;

items.erase(i_before);

rather than being:

std::list<item*>::iterator i_before = i;

items.erase(i);

i = i_before + 1;

So it is safe in this case, because list.erase() specifically doesn't invalidate any iterators other than the one erased.

That said, it's bad style - the erase function for all containers returns the next iterator specifically so you don't have to worry about invalidating iterators due to reallocation, so the idiomatic code:

i = items.erase(i);

will be safe for lists, and will also be safe for vectors, deques and any other sequence container should you want to change your storage.

You also wouldn't get the original code to compile without warnings - you'd have to write

(void)items.erase(i++);

to avoid a warning about an unused return, which would be a big clue that you're doing something odd.

回复收藏 0 原文

软甜啾 2024-07-21 04:12:27

完全没问题。
传递的值将是增量之前“i”的值。

回复收藏 0 原文

浊酒尽余欢 2024-07-21 04:12:27

++克里斯托！

C++ 标准 1.9.16 对于如何为类实现operator++(postfix) 非常有意义。当调用该operator++(int)方法时，它会增加自身并返回原始值的副本。正如 C++ 规范所说。

很高兴看到标准不断提高！

然而，我清楚地记得使用较旧的（ANSI 之前的）C 编译器，其中：

foo -> bar(i++) -> charlie(i++);

没有按照您的想法做！相反，它编译后相当于：

foo -> bar(i) -> charlie(i); ++i; ++i;

并且此行为依赖于编译器实现。（使移植变得有趣。）

测试和验证现代编译器现在的行为是否正确非常容易：

#define SHOW(S,X)  cout << S << ":  " # X " = " << (X) << endl

struct Foo
{
  Foo & bar(const char * theString, int theI)
    { SHOW(theString, theI);   return *this; }
};

int
main()
{
  Foo f;
  int i = 0;
  f . bar("A",i) . bar("B",i++) . bar("C",i) . bar("D",i);
  SHOW("END ",i);
}

响应线程中的评论......

并且构建在几乎每个人 答案...（谢谢大家！）

我认为我们需要更好地说明这一点：

鉴于：

baz(g(),h());

那么我们不知道 g() 是否会在之前或之后调用< /strong> h(). 这是“未指定”。

但我们确实知道g()和h()都会在baz()之前被调用。

给出：

bar(i++,i++);

同样，我们不知道哪个 i++ 将首先被评估，甚至可能不知道 i 是否会在 bar()< 之前递增一次或两次/em> 被调用。 结果未定义！（给定 i=0，这可能是 bar(0,0) 或 bar(1,0 ) 或 bar(0,1) 或者一些非常奇怪的东西！）

给定：

foo(i++);

我们现在知道 i 将在 foo()< 之前递增/em> 被调用。作为 Kristo 从 C++标准第1.9节中指出.16：

<块引用>
当调用函数时（无论该函数是否内联），与任何参数表达式或指定被调用函数的后缀表达式相关的每个值计算和副作用，都会在执行该函数中的每个表达式或语句之前进行排序。被调用函数的主体。 [ 注意：与不同参数表达式相关的值计算和副作用是无序的。 -- 尾注]

虽然我认为第 5.2.6 节说得更好：

<块引用>
后缀++表达式的值是其操作数的值。 [注：获取的值是原始值的副本 -- 尾注] 操作数应为可修改的左值。操作数的类型应为算术类型或指向完整有效对象类型的指针。操作数对象的值通过加 1 来修改，除非该对象是 bool 类型，在这种情况下它被设置为 true。 [注：不建议使用这种用法，请参阅附件 D。 -- 尾注] ++ 表达式的值计算在操作数对象的修改之前进行排序。对于不确定顺序的函数调用，postfix ++ 的操作是单次求值。 [ 注意：因此，函数调用不应干预左值到右值的转换和与任何单个后缀 ++ 运算符相关的副作用。 -- 尾注] 结果是一个右值。结果的类型是操作数类型的 cv 未限定版本。另请参见 5.7 和 5.17。

该标准在第 1.9.16 节中还列出了（作为示例的一部分）：

i = 7, i++, i++;    // i becomes 9 (valid)
f(i = -1, i = -1);  // the behavior is undefined

我们可以通过以下方式简单地演示这一点：

#define SHOW(X)  cout << # X " = " << (X) << endl
int i = 0;  /* Yes, it's global! */
void foo(int theI) { SHOW(theI);  SHOW(i); }
int main() { foo(i++); }

所以，是的，i 在 foo() 之前递增em> 被调用。

从以下角度来看，所有这些都非常有意义：

class Foo
{
public:
  Foo operator++(int) {...}  /* Postfix variant */
}

int main() {  Foo f;  delta( f++ ); }

这里 Foo::operator++(int) 必须在 delta() 之前调用。并且增量操作必须在该调用期间完成。

在我的（可能过于复杂）示例中：

f . bar("A",i) . bar("B",i++) . bar("C",i) . bar("D",i);

必须执行f.bar("A",i)才能获取用于object.bar("B",i++)的对象em>，对于 “C” 和 “D” 依此类推。

因此我们知道，i++ 在调用 bar("B",i++) 之前会递增 i（即使 bar("B" ,...) 使用 i) 的旧值调用，因此 i 在 bar("C",i ) 和 bar("D",i)。

回到j_random_hacker的评论：

j_random_hacker 写道：
+1，但我必须仔细阅读标准才能说服自己这没问题。我是否正确地认为，如果 bar() 是一个返回 int 的全局函数，f 是一个 int，并且这些调用是通过“^”而不是“.”连接的，那么 A、C 和 D 中的任何一个都可以报告“0”？

这个问题比您想象的要复杂得多...

将您的问题重写为代码...

int bar(const char * theString, int theI) { SHOW(...);  return i; }

bar("A",i)   ^   bar("B",i++)   ^   bar("C",i)   ^   bar("D",i);

现在我们只有 ONE 表达式。根据标准（第 1.9 节，第 8 页，pdf 第 20 页）：

<块引用>
注意：只有当运算符真正具有结合性或可交换性时，才可以根据通常的数学规则重新组合运算符。 (7) 例如，在以下片段中：a=a+32760+b+5; 表达式语句的行为与：a=(((a+32760)+b)+5); 完全相同。由于这些运算符的结合性和优先级。因此，总和的结果 (a+32760) 接下来与 b 相加，然后将该结果与 5 相加，得到分配给 a 的值。在溢出产生异常并且 int 可表示的值范围为 [-32768,+32767] 的机器上，实现无法将此表达式重写为 a=((a+b)+32765); 因为如果 a 和 b 的值分别为 -32754 和 -15，则 a+b 之和将产生异常，而原始表达式不会；该表达式也不能重写为 a=((a+32765)+b); 或 a=(a+(b+32765)); 因为 a 和 b 的值可能分别是 4 和 -8 或 -17 和 12。 但是，在溢出不会产生异常并且溢出结果是可逆的机器上，上述表达式语句可以由实现以上述任何方式重写，因为会出现相同的结果。 -- 尾注]

因此，我们可能会认为，由于优先级，我们的表达式将与以下内容相同：

(
       (
              ( bar("A",i) ^ bar("B",i++)
              )
          ^  bar("C",i)
       )
    ^ bar("D",i)
);

但是，因为 (a^b)^c==a^(b^c) 没有任何可能溢出情况下，它可以按任何顺序重写...

但是，由于 bar() 正在被调用，并且假设可能涉及副作用，因此不能以任何顺序重写该表达式。优先规则仍然适用。

这很好地确定了 bar() 的求值顺序。

现在，i+=1 何时发生？好吧，它仍然必须在调用 bar("B",...) 之前发生。（即使 bar("B",....) 是用旧值调用的。）

因此它确定性地发生在 bar(C) 和 bar 之前(D)，以及条(A)之后。

答案：否。 如果编译器符合标准，我们将始终得到“A=0，B=0，C=1，D=1”。

但请考虑另一个问题：

i = 0;
int & j = i;
R = i ^ i++ ^ j;

R 的值是多少？

如果i+=1发生在j之前，我们就会有0^0^1=1。但如果i+=1出现在整个表达式之后，我们就会有0^0^0=0。

事实上，R 为零。 i+=1 直到表达式求值后才会出现。

我认为这就是原因：

i = 7, i++, i++; // i 变为 9（有效）

是合法的...它具有三个表达式：

i = 7
i++
i++

并且在每种情况下，i 的值在每个表达式结束时都会更改。（在计算任何后续表达式之前。）

PS：考虑：

int foo(int theI) { SHOW(theI);  SHOW(i);  return theI; }
i = 0;
int & j = i;
R = i ^ i++ ^ foo(j);

在这种情况下，i+=1 必须在 foo(j) 之前计算。 theI 是 1。R 是 0^0^1=1。

++Kristo!

The C++ standard 1.9.16 makes a lot of sense with respect to how one implements operator++(postfix) for a class. When that operator++(int) method is called, it increments itself and returns a copy of the original value. Exactly as the C++ spec says.

It's nice to see standards improving!

However, I distinctly remember using older (pre-ANSI) C compilers wherein:

foo -> bar(i++) -> charlie(i++);

Did not do what you think! Instead it compiled equivalent to:

foo -> bar(i) -> charlie(i); ++i; ++i;

And this behavior was compiler-implementation dependent. (Making porting fun.)

It's easy enough to test and verify that modern compilers now behave correctly:

#define SHOW(S,X)  cout << S << ":  " # X " = " << (X) << endl

struct Foo
{
  Foo & bar(const char * theString, int theI)
    { SHOW(theString, theI);   return *this; }
};

int
main()
{
  Foo f;
  int i = 0;
  f . bar("A",i) . bar("B",i++) . bar("C",i) . bar("D",i);
  SHOW("END ",i);
}

Responding to comment in thread...

...And building on pretty much EVERYONE's answers... (Thanks guys!)

I think we need spell this out a bit better:

Given:

baz(g(),h());

Then we don't know whether g() will be invoked before or after h(). It is "unspecified".

But we do know that both g() and h() will be invoked before baz().

Given:

bar(i++,i++);

Again, we don't know which i++ will be evaluated first, and perhaps not even whether i will be incremented once or twice before bar() is called. The results are undefined! (Given i=0, this could be bar(0,0) or bar(1,0) or bar(0,1) or something really weird!)

Given:

foo(i++);

We now know that i will be incremented before foo() is invoked. As Kristo pointed out from the C++ standard section 1.9.16:

When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [ Note: Value computations and side effects associated with different argument expressions are unsequenced. -- end note ]

Though I think section 5.2.6 says it better:

The value of a postfix ++ expression is the value of its operand. [ Note: the value obtained is a copy of the original value -- end note ] The operand shall be a modifiable lvalue. The type of the operand shall be an arithmetic type or a pointer to a complete effective object type. The value of the operand object is modified by adding 1 to it, unless the object is of type bool, in which case it is set to true. [ Note: this use is deprecated, see Annex D. -- end note ] The value computation of the ++ expression is sequenced before the modification of the operand object. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single postfix ++ operator. -- end note ] The result is an rvalue. The type of the result is the cv-unqualified version of the type of the operand. See also 5.7 and 5.17.

The standard, in section 1.9.16, also lists (as part of its examples):

i = 7, i++, i++;    // i becomes 9 (valid)
f(i = -1, i = -1);  // the behavior is undefined

And we can trivially demonstrate this with:

#define SHOW(X)  cout << # X " = " << (X) << endl
int i = 0;  /* Yes, it's global! */
void foo(int theI) { SHOW(theI);  SHOW(i); }
int main() { foo(i++); }

So, yes, i is incremented before foo() is invoked.

All this makes a lot of sense from the perspective of:

class Foo
{
public:
  Foo operator++(int) {...}  /* Postfix variant */
}

int main() {  Foo f;  delta( f++ ); }

Here Foo::operator++(int) must be invoked prior to delta(). And the increment operation must be completed during that invocation.

In my (perhaps overly complex) example:

f . bar("A",i) . bar("B",i++) . bar("C",i) . bar("D",i);

f.bar("A",i) must be executed to obtain the object used for object.bar("B",i++), and so on for "C" and "D".

So we know that i++ increments i prior to calling bar("B",i++) (even though bar("B",...) is invoked with the old value of i), and therefore i is incremented prior to bar("C",i) and bar("D",i).

Getting back to j_random_hacker's comment:

j_random_hacker writes:
+1, but I had to read the standard carefully to convince myself that this was OK. Am I right in thinking that, if bar() was instead a global function returning say int, f was an int, and those invocations were connected by say "^" instead of ".", then any of A, C and D could report "0"?

This question is a lot more complicated than you might think...

Rewriting your question as code...

int bar(const char * theString, int theI) { SHOW(...);  return i; }

bar("A",i)   ^   bar("B",i++)   ^   bar("C",i)   ^   bar("D",i);

Now we have only ONE expression. According to the standard (section 1.9, page 8, pdf page 20):

Note: operators can be regrouped according to the usual mathematical rules only where the operators really are associative or commutative.(7) For example, in the following fragment: a=a+32760+b+5; the expression statement behaves exactly the same as: a=(((a+32760)+b)+5); due to the associativity and precedence of these operators. Thus, the result of the sum (a+32760) is next added to b, and that result is then added to 5 which results in the value assigned to a. On a machine in which overflows produce an exception and in which the range of values representable by an int is [-32768,+32767], the implementation cannot rewrite this expression as a=((a+b)+32765); since if the values for a and b were, respectively, -32754 and -15, the sum a+b would produce an exception while the original expression would not; nor can the expression be rewritten either as a=((a+32765)+b); or a=(a+(b+32765)); since the values for a and b might have been, respectively, 4 and -8 or -17 and 12. However on a machine in which overflows do not produce an exception and in which the results of overflows are reversible, the above expression statement can be rewritten by the implementation in any of the above ways because the same result will occur. -- end note ]

So we might think that, due to precedence, that our expression would be the same as:

(
       (
              ( bar("A",i) ^ bar("B",i++)
              )
          ^  bar("C",i)
       )
    ^ bar("D",i)
);

But, because (a^b)^c==a^(b^c) without any possible overflow situations, it could be rewritten in any order...

But, because bar() is being invoked, and could hypothetically involve side effects, this expression cannot be rewritten in just any order. Rules of precedence still apply.

Which nicely determines the order of evaluation of the bar()'s.

Now, when does that i+=1 occur? Well it still has to occur before bar("B",...) is invoked. (Even though bar("B",....) is invoked with the old value.)

So it's deterministically occurring before bar(C) and bar(D), and after bar(A).

Answer: NO. We will always have "A=0, B=0, C=1, D=1", if the compiler is standards-compliant.

But consider another problem:

i = 0;
int & j = i;
R = i ^ i++ ^ j;

What is the value of R?

If the i+=1 occurred before j, we'd have 0^0^1=1. But if the i+=1 occurred after the whole expression, we'd have 0^0^0=0.

In fact, R is zero. The i+=1 does not occur until after the expression has been evaluated.

Which I reckon is why:

i = 7, i++, i++; // i becomes 9 (valid)

Is legal... It has three expressions:

i = 7
i++
i++

And in each case, the value of i is changed at the conclusion of each expression. (Before any subsequent expressions are evaluated.)

PS: Consider:

int foo(int theI) { SHOW(theI);  SHOW(i);  return theI; }
i = 0;
int & j = i;
R = i ^ i++ ^ foo(j);

In this case, i+=1 has to be evaluated before foo(j). theI is 1. And R is 0^0^1=1.

回复收藏 0 原文

反话 2024-07-21 04:12:27

以 MarkusQ 的答案为基础：;)

或者更确切地说，比尔对此的评论：（

编辑：哦，评论又消失了......哦好吧）

他们允许并行评估。从技术上讲，它是否在实践中发生是无关紧要的。

不过，您不需要线程并行来实现这种情况，只需在第二步（增量 i）之前评估两者的第一步（取 i 的值）即可。完全合法，并且某些编译器可能认为它比在开始第二个 i++ 之前完全评估一个 i++ 更有效。

事实上，我希望这是一个常见的优化。从指令调度的角度来看。您需要评估以下内容：

取右参数的 i 值
在右参数中递增 i
取左参数的 i 值
在左参数中递增 i

但左参数和右参数之间实际上没有依赖关系争论。参数求值以未指定的顺序发生，也不需要按顺序完成（这就是为什么函数参数中的 new() 通常会导致内存泄漏，即使包装在智能指针中也是如此）
当您在同一个表达式中两次修改同一个变量时，会发生什么也是未定义的。
然而，我们确实在 1 和 2 之间以及 3 和 4 之间存在依赖关系。
那么为什么编译器要等待 2 完成后再计算 3 呢？这会增加延迟，并且在 4 可用之前需要的时间甚至会比所需的时间更长。
假设每个之间有 1 个周期的延迟，则从 1 完成到 4 的结果准备好并且我们可以调用该函数需要 3 个周期。

但如果我们重新排序并按照 1、3、2、4 的顺序进行评估，我们可以在 2 个周期内完成。 1 和 3 可以在同一个周期中启动（甚至可以合并到一条指令中，因为它是相同的表达式），并且在下面可以对 2 和 4 进行求值。
所有现代 CPU 每个周期都可以执行 3-4 条指令，优秀的编译器应该尝试利用这一点。