在 C++ 中使用增量运算符是否合法? 函数调用?
这个问题关于以下代码是否是合法的C++:
std::list<item*>::iterator i = items.begin();
while (i != items.end())
{
bool isActive = (*i)->update();
if (!isActive)
{
items.erase(i++); // *** Is this undefined behavior? ***
}
else
{
other_code_involving(*i);
++i;
}
}
这里的问题是erase()
将使有问题的迭代器无效。 如果这种情况发生在 i++
计算之前,那么像这样递增 i
在技术上是未定义的行为,即使它看起来适用于特定的编译器。 争论的一方认为,在调用函数之前,所有函数参数都会被完全评估。 另一方说,“唯一的保证是 i++ 将在下一条语句之前和使用 i++ 之后发生。无论是在调用擦除(i++)之前还是之后,都取决于编译器。”
我提出这个问题是为了希望解决这场争论。
There's been some debate going on in this question about whether the following code is legal C++:
std::list<item*>::iterator i = items.begin();
while (i != items.end())
{
bool isActive = (*i)->update();
if (!isActive)
{
items.erase(i++); // *** Is this undefined behavior? ***
}
else
{
other_code_involving(*i);
++i;
}
}
The problem here is that erase()
will invalidate the iterator in question. If that happens before i++
is evaluated, then incrementing i
like that is technically undefined behavior, even if it appears to work with a particular compiler. One side of the debate says that all function arguments are fully evaluated before the function is called. The other side says, "the only guarantees are that i++ will happen before the next statement and after i++ is used. Whether that is before erase(i++) is invoked or afterwards is compiler dependent."
I opened this question to hopefully settle that debate.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
引用 C++ 标准 1.9.16 :
所以在我看来这段代码:
是完全合法的。 它将递增
i
,然后使用i
的先前值调用foo
。 然而,这段代码:产生未定义的行为,因为第 1.9.16 段还说:
Quoth the C++ standard 1.9.16:
So it would seem to me that this code:
is perfectly legal. It will increment
i
and then callfoo
with the previous value ofi
. However, this code:yields undefined behavior because paragraph 1.9.16 also says:
以 Kristo 为基础回答,
产生未定义的行为,因为函数参数的求值顺序是未定义的(在更一般的情况下,因为如果您在表达式中读取一个变量两次,并且在其中写入它,则结果是未定义的)。 您不知道哪个参数将首先递增。
可能会导致函数调用
或
甚至
运行以下命令来查看您的平台上发生的情况:
在我的机器上我
每次都会得到,但此代码不可移植,因此我希望看到不同的结果不同编译器的结果。
To build on Kristo's answer,
yields undefined behavior because the order that function arguments are evaluated is undefined (and in the more general case because if you read a variable twice in an expression where you also write it, the result is undefined). You don't know which argument will be incremented first.
might result in a function call of
or
or even
Run the following to see what happens on your platform:
On my machine I get
every time, but this code is not portable, so I would expect to see different results with different compilers.
标准说副作用发生在调用之前,因此代码与:
而不是:
因此在这种情况下是安全的,因为 list.erase() 特别不会使除擦除的迭代器之外的任何迭代器无效。
也就是说,这是一种不好的风格 - 所有容器的擦除函数都会专门返回下一个迭代器,因此您不必担心由于重新分配而使迭代器无效,因此惯用代码:
对于列表来说是安全的,对于如果您想更改存储,则可以使用向量、双端队列和任何其他序列容器。
您也不会在没有警告的情况下编译原始代码 - 您必须编写
以避免有关未使用的返回的警告,这将是您正在做一些奇怪的事情的重要线索。
The standard says the side effect happens before the call, so the code is the same as:
rather than being:
So it is safe in this case, because list.erase() specifically doesn't invalidate any iterators other than the one erased.
That said, it's bad style - the erase function for all containers returns the next iterator specifically so you don't have to worry about invalidating iterators due to reallocation, so the idiomatic code:
will be safe for lists, and will also be safe for vectors, deques and any other sequence container should you want to change your storage.
You also wouldn't get the original code to compile without warnings - you'd have to write
to avoid a warning about an unused return, which would be a big clue that you're doing something odd.
完全没问题。
传递的值将是增量之前“i”的值。
It's perfectly OK.
The value passed would be the value of "i" before the increment.
++克里斯托!
C++ 标准 1.9.16 对于如何为类实现operator++(postfix) 非常有意义。 当调用该operator++(int)方法时,它会增加自身并返回原始值的副本。 正如 C++ 规范所说。
很高兴看到标准不断提高!
然而,我清楚地记得使用较旧的(ANSI 之前的)C 编译器,其中:
没有按照您的想法做! 相反,它编译后相当于:
并且此行为依赖于编译器实现。 (使移植变得有趣。)
测试和验证现代编译器现在的行为是否正确非常容易:
响应线程中的评论......
并且构建在几乎每个人 答案...(谢谢大家!)
我认为我们需要更好地说明这一点:
鉴于:
那么我们不知道 g() 是否会在之前或之后调用< /strong> h(). 这是“未指定”。
但我们确实知道g()和h()都会在baz()之前被调用。
给出:
同样,我们不知道哪个 i++ 将首先被评估,甚至可能不知道 i 是否会在 bar()< 之前递增一次或两次/em> 被调用。 结果未定义!(给定 i=0,这可能是 bar(0,0) 或 bar(1,0 ) 或 bar(0,1) 或者一些非常奇怪的东西!)
给定:
我们现在知道 i 将在 foo()< 之前递增/em> 被调用。 作为 Kristo 从C++标准第1.9节中指出.16:
虽然我认为第 5.2.6 节说得更好:
该标准在第 1.9.16 节中还列出了(作为示例的一部分):
我们可以通过以下方式简单地演示这一点:
所以,是的,i 在 foo() 之前递增em> 被调用。
从以下角度来看,所有这些都非常有意义:
这里 Foo::operator++(int) 必须在 delta() 之前调用。 并且增量操作必须在该调用期间完成。
在我的(可能过于复杂)示例中:
必须执行f.bar("A",i)才能获取用于object.bar("B",i++)的对象em>,对于 “C” 和 “D” 依此类推。
因此我们知道,i++ 在调用 bar("B",i++) 之前会递增 i(即使 bar("B" ,...) 使用 i) 的旧值调用,因此 i 在 bar("C",i ) 和 bar("D",i)。
回到j_random_hacker的评论:
这个问题比您想象的要复杂得多...
将您的问题重写为代码...
现在我们只有 ONE 表达式。 根据标准(第 1.9 节,第 8 页,pdf 第 20 页):
因此,我们可能会认为,由于优先级,我们的表达式将与以下内容相同:
但是,因为 (a^b)^c==a^(b^c) 没有任何可能溢出情况下,它可以按任何顺序重写...
但是,由于 bar() 正在被调用,并且假设可能涉及副作用,因此不能以任何顺序重写该表达式。 优先规则仍然适用。
这很好地确定了 bar() 的求值顺序。
现在,i+=1 何时发生? 好吧,它仍然必须在调用 bar("B",...) 之前发生。 (即使 bar("B",....) 是用旧值调用的。)
因此它确定性地发生在 bar(C) 和 bar 之前(D),以及条(A)之后。
答案:否。 如果编译器符合标准,我们将始终得到“A=0,B=0,C=1,D=1”。
但请考虑另一个问题:
R 的值是多少?
如果i+=1发生在j之前,我们就会有0^0^1=1。 但如果i+=1出现在整个表达式之后,我们就会有0^0^0=0。
事实上,R 为零。 i+=1 直到表达式求值后才会出现。
我认为这就是原因:
i = 7, i++, i++; // i 变为 9(有效)
是合法的...它具有三个表达式:
并且在每种情况下,i 的值在每个表达式结束时都会更改。 (在计算任何后续表达式之前。)
PS:考虑:
在这种情况下,i+=1 必须在 foo(j) 之前计算。 theI 是 1。R 是 0^0^1=1。
++Kristo!
The C++ standard 1.9.16 makes a lot of sense with respect to how one implements operator++(postfix) for a class. When that operator++(int) method is called, it increments itself and returns a copy of the original value. Exactly as the C++ spec says.
It's nice to see standards improving!
However, I distinctly remember using older (pre-ANSI) C compilers wherein:
Did not do what you think! Instead it compiled equivalent to:
And this behavior was compiler-implementation dependent. (Making porting fun.)
It's easy enough to test and verify that modern compilers now behave correctly:
Responding to comment in thread...
...And building on pretty much EVERYONE's answers... (Thanks guys!)
I think we need spell this out a bit better:
Given:
Then we don't know whether g() will be invoked before or after h(). It is "unspecified".
But we do know that both g() and h() will be invoked before baz().
Given:
Again, we don't know which i++ will be evaluated first, and perhaps not even whether i will be incremented once or twice before bar() is called. The results are undefined! (Given i=0, this could be bar(0,0) or bar(1,0) or bar(0,1) or something really weird!)
Given:
We now know that i will be incremented before foo() is invoked. As Kristo pointed out from the C++ standard section 1.9.16:
Though I think section 5.2.6 says it better:
The standard, in section 1.9.16, also lists (as part of its examples):
And we can trivially demonstrate this with:
So, yes, i is incremented before foo() is invoked.
All this makes a lot of sense from the perspective of:
Here Foo::operator++(int) must be invoked prior to delta(). And the increment operation must be completed during that invocation.
In my (perhaps overly complex) example:
f.bar("A",i) must be executed to obtain the object used for object.bar("B",i++), and so on for "C" and "D".
So we know that i++ increments i prior to calling bar("B",i++) (even though bar("B",...) is invoked with the old value of i), and therefore i is incremented prior to bar("C",i) and bar("D",i).
Getting back to j_random_hacker's comment:
This question is a lot more complicated than you might think...
Rewriting your question as code...
Now we have only ONE expression. According to the standard (section 1.9, page 8, pdf page 20):
So we might think that, due to precedence, that our expression would be the same as:
But, because (a^b)^c==a^(b^c) without any possible overflow situations, it could be rewritten in any order...
But, because bar() is being invoked, and could hypothetically involve side effects, this expression cannot be rewritten in just any order. Rules of precedence still apply.
Which nicely determines the order of evaluation of the bar()'s.
Now, when does that i+=1 occur? Well it still has to occur before bar("B",...) is invoked. (Even though bar("B",....) is invoked with the old value.)
So it's deterministically occurring before bar(C) and bar(D), and after bar(A).
Answer: NO. We will always have "A=0, B=0, C=1, D=1", if the compiler is standards-compliant.
But consider another problem:
What is the value of R?
If the i+=1 occurred before j, we'd have 0^0^1=1. But if the i+=1 occurred after the whole expression, we'd have 0^0^0=0.
In fact, R is zero. The i+=1 does not occur until after the expression has been evaluated.
Which I reckon is why:
i = 7, i++, i++; // i becomes 9 (valid)
Is legal... It has three expressions:
And in each case, the value of i is changed at the conclusion of each expression. (Before any subsequent expressions are evaluated.)
PS: Consider:
In this case, i+=1 has to be evaluated before foo(j). theI is 1. And R is 0^0^1=1.
以 MarkusQ 的答案为基础:;)
或者更确切地说,比尔对此的评论:(
编辑:哦,评论又消失了......哦好吧)
他们允许并行评估。 从技术上讲,它是否在实践中发生是无关紧要的。
不过,您不需要线程并行来实现这种情况,只需在第二步(增量 i)之前评估两者的第一步(取 i 的值)即可。 完全合法,并且某些编译器可能认为它比在开始第二个 i++ 之前完全评估一个 i++ 更有效。
事实上,我希望这是一个常见的优化。 从指令调度的角度来看。 您需要评估以下内容:
但左参数和右参数之间实际上没有依赖关系争论。 参数求值以未指定的顺序发生,也不需要按顺序完成(这就是为什么函数参数中的 new() 通常会导致内存泄漏,即使包装在智能指针中也是如此)
当您在同一个表达式中两次修改同一个变量时,会发生什么也是未定义的。
然而,我们确实在 1 和 2 之间以及 3 和 4 之间存在依赖关系。
那么为什么编译器要等待 2 完成后再计算 3 呢? 这会增加延迟,并且在 4 可用之前需要的时间甚至会比所需的时间更长。
假设每个之间有 1 个周期的延迟,则从 1 完成到 4 的结果准备好并且我们可以调用该函数需要 3 个周期。
但如果我们重新排序并按照 1、3、2、4 的顺序进行评估,我们可以在 2 个周期内完成。 1 和 3 可以在同一个周期中启动(甚至可以合并到一条指令中,因为它是相同的表达式),并且在下面可以对 2 和 4 进行求值。
所有现代 CPU 每个周期都可以执行 3-4 条指令,优秀的编译器应该尝试利用这一点。
To build on MarkusQ's answer: ;)
Or rather, Bill's comment to it:
(Edit: Aw, the comment is gone again... Oh well)
They're allowed to be evaluated in parallel. Whether or not it happens in practice is technically speaking irrelevant.
You don't need thread parallelism for this to occur though, just evaluate the first step of both (take the value of i) before the second (increment i). Perfectly legal, and some compilers may consider it more efficient than fully evaluating one i++ before starting on the second.
In fact, I'd expect it to be a common optimization. Look at it from an instruction scheduling point of view. You have the following you need to evaluate:
But there's really no dependency between the left and the right argument. Argument evaluation happens in an unspecified order, and need not be done sequentially either (which is why new() in function arguments is usually a memory leak, even when wrapped in a smart pointer)
It's also undefined what happens when you modify the same variable twice in the same expression.
We do have a dependency between 1 and 2, however, and between 3 and 4.
So why would the compiler wait for 2 to complete before computing 3? That introduces added latency, and it'll take even longer than necessary before 4 becomes available.
Assuming there's a 1 cycle latency between each, it'll take 3 cycles from 1 is complete until the result of 4 is ready and we can call the function.
But if we reorder them and evaluate in the order 1, 3, 2, 4, we can do it in 2 cycles. 1 and 3 can be started in the same cycle (or even merged into one instruction, since it's the same expression), and in the following, 2 and 4 can be evaluated.
All modern CPU's can execute 3-4 instructions per cycle, and a good compiler should try to exploit that.
Sutter 的本周大师#55(以及“More Exceptional C++”中的相应文章) )以这个具体案例为例进行讨论。
根据他的说法,这是完全有效的代码,实际上是尝试将语句转换为两行的情况:
不生成在语义上与原始语句等效的代码。
Sutter's Guru of the Week #55 (and the corresponding piece in "More Exceptional C++") discusses this exact case as an example.
According to him, it is perfectly valid code, and in fact a case where trying to transform the statement into two lines:
does not produce code that is semantically equivalent to the original statement.
以蜥蜴比尔的答案为基础:
还可能导致函数调用
(意味着并行评估实际值,然后应用后置操作)。
——马库斯Q
To build on Bill the Lizard's answer:
might also result in a function call of
(meaning that the actuals are evaluated in parallel, and then the postops are applied).
-- MarkusQ