为什么这些构造使用增量前和增量后未定义的行为?

发布于 2025-01-12 04:04:33 字数 633 浏览 0 评论 0 原文

#include <stdio.h>

int main(void)
{
   int i = 0;
   i = i++ + ++i;
   printf("%d\n", i); // 3

   i = 1;
   i = (i++);
   printf("%d\n", i); // 2 Should be 1, no ?

   volatile int u = 0;
   u = u++ + ++u;
   printf("%d\n", u); // 1

   u = 1;
   u = (u++);
   printf("%d\n", u); // 2 Should also be one, no ?

   register int v = 0;
   v = v++ + ++v;
   printf("%d\n", v); // 3 (Should be the same as u ?)

   int w = 0;
   printf("%d %d\n", ++w, w); // shouldn't this print 1 1

   int x[2] = { 5, 8 }, y = 0;
   x[y] = y ++;
   printf("%d %d\n", x[0], x[1]); // shouldn't this print 0 8? or 5 0?
}
#include <stdio.h>

int main(void)
{
   int i = 0;
   i = i++ + ++i;
   printf("%d\n", i); // 3

   i = 1;
   i = (i++);
   printf("%d\n", i); // 2 Should be 1, no ?

   volatile int u = 0;
   u = u++ + ++u;
   printf("%d\n", u); // 1

   u = 1;
   u = (u++);
   printf("%d\n", u); // 2 Should also be one, no ?

   register int v = 0;
   v = v++ + ++v;
   printf("%d\n", v); // 3 (Should be the same as u ?)

   int w = 0;
   printf("%d %d\n", ++w, w); // shouldn't this print 1 1

   int x[2] = { 5, 8 }, y = 0;
   x[y] = y ++;
   printf("%d %d\n", x[0], x[1]); // shouldn't this print 0 8? or 5 0?
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(15

初心 2025-01-19 04:04:33

C 具有未定义行为的概念,即某些语言结构在语法上是有效的,但您无法预测代码运行时的行为。

据我所知,该标准没有明确说明为什么存在未定义行为的概念。在我看来,这只是因为语言设计者希望在语义上有一些余地,而不是要求所有实现都以完全相同的方式处理整数溢出,这很可能会带来严重的性能成本,他们只是留下了行为未定义,因此如果您编写的代码导致整数溢出,任何事情都可能发生。

那么,考虑到这一点,为什么会出现这些“问题”呢?该语言清楚地表明某些事情会导致未定义行为。没有问题,不存在“应该”的问题。如果在声明所涉及的变量之一时未定义的行为发生变化易失性,则不会证明或改变任何内容。它是未定义;你无法推理该行为。

您最有趣的示例

u = (u++);

是未定义行为的教科书示例(请参阅维基百科的条目 sequence点)。

C has the concept of undefined behavior, i.e. some language constructs are syntactically valid but you can't predict the behavior when the code is run.

As far as I know, the standard doesn't explicitly say why the concept of undefined behavior exists. In my mind, it's simply because the language designers wanted there to be some leeway in the semantics, instead of i.e. requiring that all implementations handle integer overflow in the exact same way, which would very likely impose serious performance costs, they just left the behavior undefined so that if you write code that causes integer overflow, anything can happen.

So, with that in mind, why are these "issues"? The language clearly says that certain things lead to undefined behavior. There is no problem, there is no "should" involved. If the undefined behavior changes when one of the involved variables is declared volatile, that doesn't prove or change anything. It is undefined; you cannot reason about the behavior.

Your most interesting-looking example, the one with

u = (u++);

is a text-book example of undefined behavior (see Wikipedia's entry on sequence points).

画离情绘悲伤 2025-01-19 04:04:33

这里的大多数答案都引用了 C 标准,强调这些构造的行为是未定义的。要理解为什么这些构造的行为未定义,我们首先根据 C11 标准来理解这些术语:

顺序: (5.1.2.3)

给定任意两个计算 AB,如果 AB 之前排序,则执行A 应先于 B 执行。

未排序:

如果 A 未在 B 之前或之后排序,则 AB 未排序。

评估可以是以下两种情况之一:

  • 值计算,计算出表达式的结果;和
  • 副作用,即对象的修改。

序列点:

表达式AB的计算之间存在序列点意味着每个值计算副作用<与 A 关联的 /em> 在每次与 B 关联的值计算副作用之前进行排序。


现在回到问题,对于像

int i = 1;
i = i++;

标准这样的表达式:

6.5 表达式:

如果标量对象上的副作用相对于同一标量对象上的不同副作用或使用相同标量值的值计算而言是无序的对象,行为未定义。 [...]


因此,上述表达式调用 UB,因为同一对象 i 上的两个副作用相对于彼此是无序的。这意味着分配给 i 的副作用是在 ++ 的副作用之前还是之后完成,并没有排序。
根据分配发生在增量之前还是之后,将会产生不同的结果,这就是未定义行为的情况之一。

让我们将赋值左侧的 i 重命名为 il,将赋值右侧(在表达式 i++ 中)重命名为 ir,那么表达式就像

il = ir++     // Note that suffix l and r are used for the sake of clarity.
              // Both il and ir represents the same object.  

关于 Postfix ++ 运算符的重要一点是:

仅因为++出现在变量之后并不意味着增量发生较晚。只要编译器确保使用原始值,增量就可以在编译器喜欢的时候发生。

这意味着表达式 il = ir++ 可以被计算为

temp = ir;      // i = 1
ir = ir + 1;    // i = 2   side effect by ++ before assignment
il = temp;      // i = 1   result is 1  

temp = ir;      // i = 1
il = temp;      // i = 1   side effect by assignment before ++
ir = ir + 1;    // i = 2   result is 2  

产生两个不同的结果 12,这取决于副作用的顺序通过赋值和++,因此调用未定义的行为。

Most of the answers here quoted from C standard emphasizing that the behaviour of these constructs are undefined. To understand why the behaviour of these constructs are undefined, let's understand these terms first in the light of C11 standard:

Sequenced: (5.1.2.3)

Given any two evaluations A and B, if A is sequenced before B, then the execution of A shall precede the execution of B.

Unsequenced:

If A is not sequenced before or after B, then A and B are unsequenced.

Evaluations can be one of two things:

  • value computations, which work out the result of an expression; and
  • side effects, which are modifications of objects.

Sequence Point:

The presence of a sequence point between the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B.

Now coming to the question, for the expressions like

int i = 1;
i = i++;

standard says that:

6.5 Expressions:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behaviour is undefined. [...]

Therefore, the above expression invokes UB because two side effects on the same object i is unsequenced relative to each other. That means it is not sequenced whether the side effect by assignment to i will be done before or after the side effect by ++.
Depending on whether assignment occurs before or after the increment, different results will be produced and that's the one of the case of undefined behaviour.

Lets rename the i at left of assignment be il and at the right of assignment (in the expression i++) be ir, then the expression be like

il = ir++     // Note that suffix l and r are used for the sake of clarity.
              // Both il and ir represents the same object.  

An important point regarding Postfix ++ operator is that:

just because the ++ comes after the variable does not mean that the increment happens late. The increment can happen as early as the compiler likes as long as the compiler ensures that the original value is used.

It means the expression il = ir++ could be evaluated either as

temp = ir;      // i = 1
ir = ir + 1;    // i = 2   side effect by ++ before assignment
il = temp;      // i = 1   result is 1  

or

temp = ir;      // i = 1
il = temp;      // i = 1   side effect by assignment before ++
ir = ir + 1;    // i = 2   result is 2  

resulting in two different results 1 and 2 which depends on the sequence of side effects by assignment and ++ and hence invokes undefined behaviour.

瑾兮 2025-01-19 04:04:33

我认为C99标准的相关部分是6.5表达式,§2

在上一个和下一个序列点之间,对象应具有其存储的值
通过表达式的求值最多修改一次。此外,先验值
应只读以确定要存储的值。

和 6.5.16 赋值运算符,§4:

操作数的求值顺序未指定。如果尝试修改
赋值运算符的结果或在下一个序列点之后访问它,
行为未定义。

I think the relevant parts of the C99 standard are 6.5 Expressions, §2

Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.

and 6.5.16 Assignment operators, §4:

The order of evaluation of the operands is unspecified. If an attempt is made to modify
the result of an assignment operator or to access it after the next sequence point, the
behavior is undefined.

酒废 2025-01-19 04:04:33

只要编译和反汇编你的代码行,如果你很想知道它到底是如何得到你所得到的。

这是我在我的机器上得到的,以及我认为正在发生的事情:(

$ cat evil.c
void evil(){
  int i = 0;
  i+= i++ + ++i;
}
$ gcc evil.c -c -o evil.bin
$ gdb evil.bin
(gdb) disassemble evil
Dump of assembler code for function evil:
   0x00000000 <+0>:   push   %ebp
   0x00000001 <+1>:   mov    %esp,%ebp
   0x00000003 <+3>:   sub    $0x10,%esp
   0x00000006 <+6>:   movl   $0x0,-0x4(%ebp)  // i = 0   i = 0
   0x0000000d <+13>:  addl   $0x1,-0x4(%ebp)  // i++     i = 1
   0x00000011 <+17>:  mov    -0x4(%ebp),%eax  // j = i   i = 1  j = 1
   0x00000014 <+20>:  add    %eax,%eax        // j += j  i = 1  j = 2
   0x00000016 <+22>:  add    %eax,-0x4(%ebp)  // i += j  i = 3
   0x00000019 <+25>:  addl   $0x1,-0x4(%ebp)  // i++     i = 4
   0x0000001d <+29>:  leave  
   0x0000001e <+30>:  ret
End of assembler dump.

我......假设 0x00000014 指令是某种编译器优化?)

Just compile and disassemble your line of code, if you are so inclined to know how exactly it is you get what you are getting.

This is what I get on my machine, together with what I think is going on:

$ cat evil.c
void evil(){
  int i = 0;
  i+= i++ + ++i;
}
$ gcc evil.c -c -o evil.bin
$ gdb evil.bin
(gdb) disassemble evil
Dump of assembler code for function evil:
   0x00000000 <+0>:   push   %ebp
   0x00000001 <+1>:   mov    %esp,%ebp
   0x00000003 <+3>:   sub    $0x10,%esp
   0x00000006 <+6>:   movl   $0x0,-0x4(%ebp)  // i = 0   i = 0
   0x0000000d <+13>:  addl   $0x1,-0x4(%ebp)  // i++     i = 1
   0x00000011 <+17>:  mov    -0x4(%ebp),%eax  // j = i   i = 1  j = 1
   0x00000014 <+20>:  add    %eax,%eax        // j += j  i = 1  j = 2
   0x00000016 <+22>:  add    %eax,-0x4(%ebp)  // i += j  i = 3
   0x00000019 <+25>:  addl   $0x1,-0x4(%ebp)  // i++     i = 4
   0x0000001d <+29>:  leave  
   0x0000001e <+30>:  ret
End of assembler dump.

(I... suppose that the 0x00000014 instruction was some kind of compiler optimization?)

无悔心 2025-01-19 04:04:33

该行为无法真正解释,因为它同时调用 未指定行为未定义行为,因此我们无法对此代码做出任何一般性预测,尽管如果您阅读Olve Maudal 的作品,例如 Deep C未指定和未定义 有时,您可以在非常具体的情况下使用特定的内容做出很好的猜测编译器和环境,但请不要在生产附近这样做。

因此,继续讨论

运算符和操作数的分组由语法指示。74) 除非另有规定
稍后(对于函数调用 ()、&&、||、?: 和逗号运算符),子表达式的求值顺序和副作用发生的顺序均未指定。


因此,当我们有这样一行时:

i = i++ + ++i;

我们不知道首先评估的是 i++ 还是 ++i。这主要是为了给编译器更好的优化选项

我们这里也有未定义的行为,因为程序在i、u等..) href="http://en.wikipedia.org/wiki/Sequence_point" rel="noreferrer">序列点。来自草案标准部分 6.5 段落 2强调我的):

在上一个和下一个序列点之间,对象应具有其存储值
通过表达式的求值最多修改一次
。此外,先验值
应只读以确定要存储的值。

它引用了以下未定义的代码示例:

i = ++i + 1;
a[i++] = i; 

在所有这些示例中,代码尝试在同一序列点中多次修改对象,在每种情况下都会以 ; 结尾:

i = i++ + ++i;
^   ^       ^

i = (i++);
^    ^

u = u++ + ++u;
^   ^       ^

u = (u++);
^    ^

v = v++ + ++v;
^   ^       ^

未指定的行为c99 标准草案3.4.4 部分如下:

使用未指定的值,或本国际标准规定的其他行为
两种或多种可能性,并且对任何选择中的选择不施加进一步的要求
实例

未定义行为3.4.3节中定义为:

使用不可移植或错误的程序构造或错误数据时的行为,
本国际标准没有对此提出要求

,并指出:

可能的未定义行为包括完全忽略结果不可预测的情况,到在翻译或程序执行期间以环境特有的记录方式表现(无论是否发出诊断消息),到终止翻译或执行(并发出诊断消息)。

The behavior can't really be explained because it invokes both unspecified behavior and undefined behavior, so we can not make any general predictions about this code, although if you read Olve Maudal's work such as Deep C and Unspecified and Undefined sometimes you can make good guesses in very specific cases with a specific compiler and environment but please don't do that anywhere near production.

So moving on to unspecified behavior, in draft c99 standard section6.5 paragraph 3 says(emphasis mine):

The grouping of operators and operands is indicated by the syntax.74) Except as specified
later (for the function-call (), &&, ||, ?:, and comma operators), the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.

So when we have a line like this:

i = i++ + ++i;

we do not know whether i++ or ++i will be evaluated first. This is mainly to give the compiler better options for optimization.

We also have undefined behavior here as well since the program is modifying variables(i, u, etc..) more than once between sequence points. From draft standard section 6.5 paragraph 2(emphasis mine):

Between the previous and next sequence point an object shall have its stored value
modified at most once
by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored
.

it cites the following code examples as being undefined:

i = ++i + 1;
a[i++] = i; 

In all these examples the code is attempting to modify an object more than once in the same sequence point, which will end with the ; in each one of these cases:

i = i++ + ++i;
^   ^       ^

i = (i++);
^    ^

u = u++ + ++u;
^   ^       ^

u = (u++);
^    ^

v = v++ + ++v;
^   ^       ^

Unspecified behavior is defined in the draft c99 standard in section 3.4.4 as:

use of an unspecified value, or other behavior where this International Standard provides
two or more possibilities and imposes no further requirements on which is chosen in any
instance

and undefined behavior is defined in section 3.4.3 as:

behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements

and notes that:

Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

七颜 2025-01-19 04:04:33

回答这个问题的另一种方法,不是陷入序列点和未定义行为的神秘细节中,而是简单地问,它们应该意味着什么? 程序员试图做什么?

第一个询问的片段,i = i++ + ++i,在我的书中显然很疯狂。没有人会在真实的程序中编写它,它的作用并不明显,没有任何人可以想象的算法可以尝试编码来导致这种特定的人为操作序列。由于你我都不清楚它应该做什么,所以在我的书中,如果编译器无法弄清楚它应该做什么也没关系。

第二个片段 i = i++ 更容易理解一些。看起来有人试图递增 i,并将结果分配回 i。但在 C 语言中,有几种方法可以实现此目的。获取 i 的值,加 1,然后将结果赋回 i 的最基本方法是:几乎所有编程语言都一样:

i = i + 1

当然,C 有一个方便的快捷方式:

i++

这也意味着“取 i 的值,加 1,然后将结果赋回给 i代码>”。因此,如果我们构建两者的大杂烩,通过编写

i = i++

我们真正要说的是“取 i 的值,加 1,将结果分配回 i,并将结果分配回i”。我们很困惑,所以如果编译器也很困惑,我也不会太担心。

实际上,只有当人们将它们用作 ++ 应该如何工作的人工示例时,才会编写这些疯狂的表达式。当然,了解 ++ 的工作原理也很重要。但使用 ++ 的一条实用规则是,“如果使用 ++ 的表达式的含义不明显,则不要编写它。”

我们过去常常在 comp.lang.c 上花费无数时间讨论这些表达式以及为什么它们未定义。我的两个较长的答案试图真正解释原因,已在网络上存档:

另请参阅问题 3.8 以及其余问题在第 3 节中。 com/" rel="nofollow noreferrer">C 常见问题列表。

Another way of answering this, rather than getting bogged down in arcane details of sequence points and undefined behavior, is simply to ask, what are they supposed to mean? What was the programmer trying to do?

The first fragment asked about, i = i++ + ++i, is pretty clearly insane in my book. No one would ever write it in a real program, it's not obvious what it does, there's no conceivable algorithm someone could have been trying to code that would have resulted in this particular contrived sequence of operations. And since it's not obvious to you and me what it's supposed to do, it's fine in my book if the compiler can't figure out what it's supposed to do, either.

The second fragment, i = i++, is a little easier to understand. It looks like someone is trying to increment i, and assign the result back to i. But there are a couple ways of doing this in C. The most basic way to take i's value, add 1, and assign the result back to i, is the same in almost any programming language:

i = i + 1

C, of course, has a handy shortcut:

i++

This also means, "take i's value, add 1, and assign the result back to i". So if we construct a hodgepodge of the two, by writing

i = i++

what we're really saying is "take i's value, add 1, assign the result back to i, and assign the result back to i". We're confused, so it doesn't bother me too much if the compiler gets confused, too.

Realistically, the only time these crazy expressions get written is when people are using them as artificial examples of how ++ is supposed to work. And of course it is important to understand how ++ works. But one practical rule for using ++ is, "If it's not obvious what an expression using ++ means, don't write it."

We used to spend countless hours on comp.lang.c discussing expressions like these and why they're undefined. Two of my longer answers, that try to really explain why, are archived on the web:

See also question 3.8 and the rest of the questions in section 3 of the C FAQ list.

美羊羊 2025-01-19 04:04:33

类似变体相关的问题的重复链接

printf("%d %d\n", i, i++);

通常,此问题会作为与类似代码或

printf("%d %d\n", ++i, i++);

虽然这也是未定义行为,如前所述,但存在细微差别当与以下语句进行比较时涉及 printf() 时:

x = i++ + i++;

在以下语句中:

printf("%d %d\n", ++i, i++);

printf() 中参数的rel="noreferrer">求值顺序未指定。这意味着表达式 i++++i 可以按任意顺序求值。 C11标准对此有一些相关描述:

Annex J,未指定行为

函数指示符、参数和的顺序
参数中的子表达式在函数调用中计算
(6.5.2.2)。

3.4.4、未指定行为

使用未指定的值或其他行为
国际标准提供了两种或多种可能性并强加
在任何情况下都没有进一步的选择要求。

示例 未指定行为的一个示例是
计算函数的参数。

未指定的行为本身不是问题。考虑这个例子:

printf("%d %d\n", ++x, y++);

这也有未指定的行为,因为 ++xy++ 的计算顺序未指定。但这是完全合法有效的声明。此语句中没有未定义的行为。因为修改(++xy++)是针对不同的对象进行的。

使以下语句呈现

printf("%d %d\n", ++i, i++);

未定义行为的事实是,这两个表达式修改相同对象i,而无需介入序列点


另一个细节是 printf() 调用中涉及的逗号是一个分隔符,而不是逗号运算符

这是一个重要的区别,因为逗号运算符确实在其操作数的计算之间引入了序列点,这使得以下内容合法:

int i = 5;
int j;

j = (++i, i++);  // No undefined behaviour here because the comma operator 
                 // introduces a sequence point between '++i' and 'i++'

printf("i=%d j=%d\n",i, j); // prints: i=7 j=6

逗号运算符从左到右计算其操作数-right 并且仅产生最后一个操作数的值。因此,在 j = (++i, i++); 中,++ii 递增到 6 并且 < code>i++ 产生 i (6) 的旧值,该值被分配给 j。然后由于后自增,i 变为7

因此,如果函数调用中的逗号是逗号运算符,那么

printf("%d %d\n", ++i, i++);

就不会出现问题。但它会调用未定义的行为,因为这里的逗号是一个分隔符


对于那些刚接触未定义行为的人来说,阅读每个 C 程序员都应该了解未定义行为,以了解 C 中未定义行为的概念和许多其他变体。

这篇文章:未定义、未指定和实现定义的行为也相关。

Often this question is linked as a duplicate of questions related to code like

printf("%d %d\n", i, i++);

or

printf("%d %d\n", ++i, i++);

or similar variants.

While this is also undefined behaviour as stated already, there are subtle differences when printf() is involved when comparing to a statement such as:

x = i++ + i++;

In the following statement:

printf("%d %d\n", ++i, i++);

the order of evaluation of arguments in printf() is unspecified. That means, expressions i++ and ++i could be evaluated in any order. C11 standard has some relevant descriptions on this:

Annex J, unspecified behaviours

The order in which the function designator, arguments, and
subexpressions within the arguments are evaluated in a function call
(6.5.2.2).

3.4.4, unspecified behavior

Use of an unspecified value, or other behavior where this
International Standard provides two or more possibilities and imposes
no further requirements on which is chosen in any instance.

EXAMPLE An example of unspecified behavior is the order in which the
arguments to a function are evaluated.

The unspecified behaviour itself is NOT an issue. Consider this example:

printf("%d %d\n", ++x, y++);

This too has unspecified behaviour because the order of evaluation of ++x and y++ is unspecified. But it's perfectly legal and valid statement. There's no undefined behaviour in this statement. Because the modifications (++x and y++) are done to distinct objects.

What renders the following statement

printf("%d %d\n", ++i, i++);

as undefined behaviour is the fact that these two expressions modify the same object i without an intervening sequence point.


Another detail is that the comma involved in the printf() call is a separator, not the comma operator.

This is an important distinction because the comma operator does introduce a sequence point between the evaluation of their operands, which makes the following legal:

int i = 5;
int j;

j = (++i, i++);  // No undefined behaviour here because the comma operator 
                 // introduces a sequence point between '++i' and 'i++'

printf("i=%d j=%d\n",i, j); // prints: i=7 j=6

The comma operator evaluates its operands left-to-right and yields only the value of the last operand. So in j = (++i, i++);, ++i increments i to 6 and i++ yields old value of i (6) which is assigned to j. Then i becomes 7 due to post-increment.

So if the comma in the function call were to be a comma operator then

printf("%d %d\n", ++i, i++);

will not be a problem. But it invokes undefined behaviour because the comma here is a separator.


For those who are new to undefined behaviour would benefit from reading What Every C Programmer Should Know About Undefined Behavior to understand the concept and many other variants of undefined behaviour in C.

This post: Undefined, unspecified and implementation-defined behavior is also relevant.

姜生凉生 2025-01-19 04:04:33

虽然任何编译器和处理器实际上不太可能这样做,但根据 C 标准,编译器使用以下序列实现“i++”是合法的:

In a single operation, read `i` and lock it to prevent access until further notice
Compute (1+read_value)
In a single operation, unlock `i` and store the computed value

虽然我不认为任何处理器支持允许这样的硬件一件需要高效完成的事情,人们可以很容易地想象这种行为将使多线程代码变得更容易的情况(例如,它将保证如果两个线程尝试同时执行上述序列,i将递增两个)这并不是完全不可想象的未来的处理器可能会提供类似的功能。

如果编译器按照上面的指示编写 i++ (在标准下合法)并且在整个表达式的求值过程中散布上述指令(也是合法的),并且如果它没有发生请注意,其他指令之一碰巧访问了 i,编译器有可能(并且合法)生成一系列会导致死锁的指令。可以肯定的是,在两个地方使用相同变量 i 的情况下,编译器几乎肯定会检测到问题,但如果例程接受对两个指针 p 的引用和 q,并在上面的表达式中使用 (*p)(*q)(而不是使用 i 两次)编译器不需要识别或避免如果pq 都传递了相同对象的地址。

While it is unlikely that any compilers and processors would actually do so, it would be legal, under the C standard, for the compiler to implement "i++" with the sequence:

In a single operation, read `i` and lock it to prevent access until further notice
Compute (1+read_value)
In a single operation, unlock `i` and store the computed value

While I don't think any processors support the hardware to allow such a thing to be done efficiently, one can easily imagine situations where such behavior would make multi-threaded code easier (e.g. it would guarantee that if two threads try to perform the above sequence simultaneously, i would get incremented by two) and it's not totally inconceivable that some future processor might provide a feature something like that.

If the compiler were to write i++ as indicated above (legal under the standard) and were to intersperse the above instructions throughout the evaluation of the overall expression (also legal), and if it didn't happen to notice that one of the other instructions happened to access i, it would be possible (and legal) for the compiler to generate a sequence of instructions that would deadlock. To be sure, a compiler would almost certainly detect the problem in the case where the same variable i is used in both places, but if a routine accepts references to two pointers p and q, and uses (*p) and (*q) in the above expression (rather than using i twice) the compiler would not be required to recognize or avoid the deadlock that would occur if the same object's address were passed for both p and q.

世界等同你 2025-01-19 04:04:33

虽然 a = a++a++ + a++ 等表达式的语法是合法的,但这些构造的行为未定义,因为不遵守 C 标准中的C99 6.5p2

  • 在上一个和下一个序列点之间,对象的存储值最多应通过表达式的求值修改一次。 [72]此外,应只读先前值以确定要存储的值[73]
  • 使用 脚注 73 进一步澄清

  • 本段呈现未定义的语句表达式,例如

    <前><代码>i = ++i + 1;
    a[i++] = i;

    在允许的情况下

    <前><代码>i = i + 1;
    a[i] = i;

  • 各种序列点列于 C11 (和 C99):

    1. 以下是 5.1.2.3 中描述的序列点:

      • 在函数调用和实际调用中函数指示符和实际参数的计算之间。 (6.5.2.2)。
      • 在以下运算符的第一个和第二个操作数的计算之间:逻辑 AND && (6.5.13);逻辑或|| (6.5.14);逗号 , (6.5.17).
      • 在条件 ? 的第一个操作数的计算之间: 运算符以及计算第二个和第三个操作数中的任意一个 (6.5.15)。
      • 完整声明符的结尾:声明符 (6.7.6);
      • 在完整表达式的计算和下一个要计算的完整表达式之间。以下是完整表达式: 不是复合文字一部分的初始值设定项 (6.7.9);表达式语句中的表达式 (6.8.3);选择语句的控制表达式(if 或 switch)(6.8.4); while 或 do 语句的控制表达式 (6.8.5); for 语句的每个(可选)表达式 (6.8.5.3); return 语句中的(可选)表达式 (6.8.6.4)。
      • 紧接在库函数返回之前 (7.1.4)。
      • 在与每个格式化输入/输出函数转换说明符(7.21.6、7.29.2)关联的操作之后。
      • 在每次调用比较函数之前和之后,以及在对比较函数的任何调用与作为参数传递给该调用的对象的任何移动之间 (7.22.5)。

    C11 中同一段落的措辞是:

  • 如果标量对象上的副作用相对于同一标量对象上的不同副作用或使用同一标量对象的值进行的值计算是无序的,则该行为是未定义的。如果表达式的子表达式有多个允许的排序,则如果任何排序中发生此类未排序的副作用,则行为未定义。84)

  • 您可以通过使用最新版本的 GCC 来检测程序中的此类错误-Wall-Werror,然后 GCC 将彻底拒绝编译你的程序。以下是 gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005 的输出:

    % gcc plusplus.c -Wall -Werror -pedantic
    plusplus.c: In function ‘main’:
    plusplus.c:6:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
        i = i++ + ++i;
        ~~^~~~~~~~~~~
    plusplus.c:6:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
    plusplus.c:10:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
        i = (i++);
        ~~^~~~~~~
    plusplus.c:14:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
        u = u++ + ++u;
        ~~^~~~~~~~~~~
    plusplus.c:14:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
    plusplus.c:18:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
        u = (u++);
        ~~^~~~~~~
    plusplus.c:22:6: error: operation on ‘v’ may be undefined [-Werror=sequence-point]
        v = v++ + ++v;
        ~~^~~~~~~~~~~
    plusplus.c:22:6: error: operation on ‘v’ may be undefined [-Werror=sequence-point]
    cc1: all warnings being treated as errors
    

    重要的部分是了解 什么是序列点 - 以及什么是序列点以及不是。例如,逗号运算符是一个序列点,因此

    j = (i ++, ++ i);
    

    定义良好,并且会将i增加1,产生旧值,丢弃该值;然后在逗号运算符处解决副作用;然后将 i 加一,结果值就成为表达式的值 - 即,这只是编写 j = (i += 2) 的一种人为方式,其中又是一种“聪明”的书写方式。

    i += 2;
    j = i;
    

    但是,函数参数列表中的 , 不是逗号运算符,并且不同参数的计算之间没有序列点;相反,他们的评估彼此之间没有顺序;因此函数调用

    int i = 0;
    printf("%d %d\n", i++, ++i, i);
    

    具有未定义的行为,因为函数参数中的i++++i的计算之间没有序列点< /strong>,因此 i 的值在上一个序列和下一个序列之间被 i++++i 修改两次观点。

    While the syntax of the expressions like a = a++ or a++ + a++ is legal, the behaviour of these constructs is undefined because a shall in C standard is not obeyed. C99 6.5p2:

    1. Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. [72] Furthermore, the prior value shall be read only to determine the value to be stored [73]

    With footnote 73 further clarifying that

    1. This paragraph renders undefined statement expressions such as

      i = ++i + 1;
      a[i++] = i;
      

      while allowing

      i = i + 1;
      a[i] = i;
      

    The various sequence points are listed in Annex C of C11 (and C99):

    1. The following are the sequence points described in 5.1.2.3:

      • Between the evaluations of the function designator and actual arguments in a function call and the actual call. (6.5.2.2).
      • Between the evaluations of the first and second operands of the following operators: logical AND && (6.5.13); logical OR || (6.5.14); comma , (6.5.17).
      • Between the evaluations of the first operand of the conditional ? : operator and whichever of the second and third operands is evaluated (6.5.15).
      • The end of a full declarator: declarators (6.7.6);
      • Between the evaluation of a full expression and the next full expression to be evaluated. The following are full expressions: an initializer that is not part of a compound literal (6.7.9); the expression in an expression statement (6.8.3); the controlling expression of a selection statement (if or switch) (6.8.4); the controlling expression of a while or do statement (6.8.5); each of the (optional) expressions of a for statement (6.8.5.3); the (optional) expression in a return statement (6.8.6.4).
      • Immediately before a library function returns (7.1.4).
      • After the actions associated with each formatted input/output function conversion specifier (7.21.6, 7.29.2).
      • Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any movement of the objects passed as arguments to that call (7.22.5).

    The wording of the same paragraph in C11 is:

    1. If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.84)

    You can detect such errors in a program by for example using a recent version of GCC with -Wall and -Werror, and then GCC will outright refuse to compile your program. The following is the output of gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005:

    % gcc plusplus.c -Wall -Werror -pedantic
    plusplus.c: In function ‘main’:
    plusplus.c:6:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
        i = i++ + ++i;
        ~~^~~~~~~~~~~
    plusplus.c:6:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
    plusplus.c:10:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
        i = (i++);
        ~~^~~~~~~
    plusplus.c:14:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
        u = u++ + ++u;
        ~~^~~~~~~~~~~
    plusplus.c:14:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
    plusplus.c:18:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
        u = (u++);
        ~~^~~~~~~
    plusplus.c:22:6: error: operation on ‘v’ may be undefined [-Werror=sequence-point]
        v = v++ + ++v;
        ~~^~~~~~~~~~~
    plusplus.c:22:6: error: operation on ‘v’ may be undefined [-Werror=sequence-point]
    cc1: all warnings being treated as errors
    

    The important part is to know what a sequence point is -- and what is a sequence point and what isn't. For example the comma operator is a sequence point, so

    j = (i ++, ++ i);
    

    is well-defined, and will increment i by one, yielding the old value, discard that value; then at comma operator, settle the side effects; and then increment i by one, and the resulting value becomes the value of the expression - i.e. this is just a contrived way to write j = (i += 2) which is yet again a "clever" way to write

    i += 2;
    j = i;
    

    However, the , in function argument lists is not a comma operator, and there is no sequence point between evaluations of distinct arguments; instead their evaluations are unsequenced with regard to each other; so the function call

    int i = 0;
    printf("%d %d\n", i++, ++i, i);
    

    has undefined behaviour because there is no sequence point between the evaluations of i++ and ++i in function arguments, and the value of i is therefore modified twice, by both i++ and ++i, between the previous and the next sequence point.

    云之铃。 2025-01-19 04:04:33

    您的问题可能不是“为什么这些构造在 C 中是未定义的行为?”。您的问题可能是,“为什么这段代码(使用 ++)没有给我预期的值?”,有人将您的问题标记为重复,并将您发送到这里。

    这个答案试图回答这个问题:为什么你的代码没有给你预期的答案,以及如何学习识别(并避免)无法按预期工作的表达式。

    我假设您现在已经听说过 C 的 ++-- 运算符的基本定义,以及前缀形式 ++x 的不同之处来自后缀形式x++。但这些运算符很难思考,因此为了确保您理解,也许您编写了一个包含类似内容的小测试程序

    int x = 5;
    printf("%d %d %d\n", x, ++x, x++);
    

    但是,令您惊讶的是,这个程序并没有帮助您理解 - 它打印了一些奇怪的、无法解释的输出,表明也许 ++ 做了一些完全不同的事情,根本不是你想象的那样。

    或者,也许您正在看到一个难以理解的表达式,例如“

    int x = 5;
    x = x++ + ++x;
    printf("%d\n", x);
    

    也许有人给了您该代码作为谜题”。这段代码也没有任何意义,特别是如果你运行它 - 如果你在两个不同的编译器下编译和运行它,你可能会得到两个不同的答案!这是怎么回事?哪个答案是正确的? (答案是它们都是,或者都不是。)

    正如您现在所听到的,这些表达式是未定义,这意味着 C 语言不保证它们的含义会的。这是一个奇怪且令人不安的结果,因为您可能认为您可以编写的任何程序,只要编译并运行,都会生成唯一的、定义良好的输出。但在未定义行为的情况下,情况并非如此。

    是什么导致表达式未定义?涉及 ++-- 的表达式总是未定义吗?当然不是:这些都是有用的运算符,如果正确使用它们,它们的定义是完美的。

    对于我们正在讨论的表达式,当同时发生太多事情时,当我们无法判断事情将以什么顺序发生时,但当顺序对我们得到的结果很重要时,使它们未定义的原因是。

    让我们回到我在这个答案中使用的两个例子。当我写的

    printf("%d %d %d\n", x, ++x, x++);
    

    问题是,在实际调用 printf 之前,编译器是否首先计算 x 的值,或者 x++,或者可能 ++x?但事实证明我们不知道。 C 中没有规则规定函数的参数按从左到右、从右到左或其他顺序求值。所以我们不能说编译器是先执行x,然后是++x,然后是x++,还是x++ 然后 ++x 然后 x,或其他顺序。但顺序显然很重要,因为根据编译器使用的顺序,我们将清楚地打印出一系列不同的数字。

    这疯狂的表情是怎么回事?

    x = x++ + ++x;
    

    该表达式的问题在于它包含三种不同的修改 x 值的尝试: (1) x++ 部分尝试采用 x的值,加1,将新值存储到x中,并返回旧值; (2) ++x 部分尝试获取 x 的值,加 1,将新值存储在 x 中,然后返回新价值; (3) x = 部分尝试将其他两个的总和赋回给 x。这三个尝试的任务中哪一个会“获胜”?这三个值中的哪一个将真正决定 x 的最终值?再次,也许令人惊讶的是,C 语言中没有任何规则可以告诉我们。

    您可能会认为优先级、关联性或从左到右的计算会告诉您事情发生的顺序,但事实并非如此。你可能不相信我,但请相信我的话,我再说一遍:优先级和结合性并不能决定 C 中表达式求值顺序的各个方面。特别是,如果在一个表达式内有多个当我们尝试为 x 之类的东西分配新值的不同位置时,优先级和关联性并不能告诉我们哪些尝试首先发生,或者最后发生,或者其他什么。


    因此,在了解了所有背景和介绍之后,如果您想确保所有程序都定义良好,您可以编写哪些表达式,不能编写哪些表达式?

    这些表达式都很好:

    y = x++;
    z = x++ + y++;
    x = x + 1;
    x = a[i++];
    x = a[i++] + b[j++];
    x[i++] = a[j++] + b[k++];
    x = *p++;
    x = *p++ + *q++;
    

    这些表达式都是未定义的:

    x = x++;
    x = x++ + ++x;
    y = x + x++;
    a[i] = i++;
    a[i++] = i;
    printf("%d %d %d\n", x, ++x, x++);
    

    最后一个问题是,如何判断哪些表达式是明确定义的,哪些表达式是未定义的?

    正如我之前所说,未定义的表达式是那些同时发生太多事情的表达式,您无法确定事情发生的顺序,以及顺序很重要:

    1. 如果有一个变量正在被修改(分配给)两个或多个不同的地方,你怎么知道哪个修改首先发生?
    2. 如果有一个变量在一个地方被修改,并且在另一个地方使用了它的值,你如何知道它是使用旧值还是新值?

    作为 #1 的示例,在表达式中

    x = x++ + ++x;
    

    存在 3 次修改 x 的尝试。

    作为 #2 的示例,在表达式中

    y = x + x++;
    

    我们都使用了 x 的值,并对其进行了修改。

    这就是答案:确保在您编写的任何表达式中,每个变量最多被修改一次,并且如果修改了变量,您也不会尝试在其他地方使用该变量的值。


    还有一件事。您可能想知道如何“修复”我在这个答案中提出的未定义表达式。

    对于 printf("%d %d %d\n", x, ++x, x++); 来说,很简单 - 只需将其写为三个单独的 printf 调用:

    printf("%d ", x);
    printf("%d ", ++x);
    printf("%d\n", x++);
    

    现在行为已完全定义,您将得到合理的结果。

    另一方面,对于 x = x++ + ++x 的情况,没有办法修复它。没有办法编写它来保证它的行为符合您的期望 - 但这没关系,因为无论如何您都不会在实际程序中编写像 x = x++ + ++x 这样的表达式。

    Your question was probably not, "Why are these constructs undefined behavior in C?". Your question was probably, "Why did this code (using ++) not give me the value I expected?", and someone marked your question as a duplicate, and sent you here.

    This answer tries to answer that question: why did your code not give you the answer you expected, and how can you learn to recognize (and avoid) expressions that will not work as expected.

    I assume you've heard the basic definition of C's ++ and -- operators by now, and how the prefix form ++x differs from the postfix form x++. But these operators are hard to think about, so to make sure you understood, perhaps you wrote a tiny little test program involving something like

    int x = 5;
    printf("%d %d %d\n", x, ++x, x++);
    

    But, to your surprise, this program did not help you understand — it printed some strange, inexplicable output, suggesting that maybe ++ does something completely different, not at all what you thought it did.

    Or, perhaps you're looking at a hard-to-understand expression like

    int x = 5;
    x = x++ + ++x;
    printf("%d\n", x);
    

    Perhaps someone gave you that code as a puzzle. This code also makes no sense, especially if you run it — and if you compile and run it under two different compilers, you're likely to get two different answers! What's up with that? Which answer is correct? (And the answer is that both of them are, or neither of them are.)

    As you've heard by now, these expressions are undefined, which means that the C language makes no guarantee about what they'll do. This is a strange and unsettling result, because you probably thought that any program you could write, as long as it compiled and ran, would generate a unique, well-defined output. But in the case of undefined behavior, that's not so.

    What makes an expression undefined? Are expressions involving ++ and -- always undefined? Of course not: these are useful operators, and if you use them properly, they're perfectly well-defined.

    For the expressions we're talking about, what makes them undefined is when there's too much going on at once, when we can't tell what order things will happen in, but when the order matters to the result we'll get.

    Let's go back to the two examples I've used in this answer. When I wrote

    printf("%d %d %d\n", x, ++x, x++);
    

    the question is, before actually calling printf, does the compiler compute the value of x first, or x++, or maybe ++x? But it turns out we don't know. There's no rule in C which says that the arguments to a function get evaluated left-to-right, or right-to-left, or in some other order. So we can't say whether the compiler will do x first, then ++x, then x++, or x++ then ++x then x, or some other order. But the order clearly matters, because depending on which order the compiler uses, we'll clearly get a different series of numbers printed out.

    What about this crazy expression?

    x = x++ + ++x;
    

    The problem with this expression is that it contains three different attempts to modify the value of x: (1) the x++ part tries to take x's value, add 1, store the new value in x, and return the old value; (2) the ++x part tries to take x's value, add 1, store the new value in x, and return the new value; and (3) the x = part tries to assign the sum of the other two back to x. Which of those three attempted assignments will "win"? Which of the three values will actually determine the final value of x? Again, and perhaps surprisingly, there's no rule in C to tell us.

    You might imagine that precedence or associativity or left-to-right evaluation tells you what order things happen in, but they do not. You may not believe me, but please take my word for it, and I'll say it again: precedence and associativity do not determine every aspect of the evaluation order of an expression in C. In particular, if within one expression there are multiple different spots where we try to assign a new value to something like x, precedence and associativity do not tell us which of those attempts happens first, or last, or anything.


    So with all that background and introduction out of the way, if you want to make sure that all your programs are well-defined, which expressions can you write, and which ones can you not write?

    These expressions are all fine:

    y = x++;
    z = x++ + y++;
    x = x + 1;
    x = a[i++];
    x = a[i++] + b[j++];
    x[i++] = a[j++] + b[k++];
    x = *p++;
    x = *p++ + *q++;
    

    These expressions are all undefined:

    x = x++;
    x = x++ + ++x;
    y = x + x++;
    a[i] = i++;
    a[i++] = i;
    printf("%d %d %d\n", x, ++x, x++);
    

    And the last question is, how can you tell which expressions are well-defined, and which expressions are undefined?

    As I said earlier, the undefined expressions are the ones where there's too much going at once, where you can't be sure what order things happen in, and where the order matters:

    1. If there's one variable that's getting modified (assigned to) in two or more different places, how do you know which modification happens first?
    2. If there's a variable that's getting modified in one place, and having its value used in another place, how do you know whether it uses the old value or the new value?

    As an example of #1, in the expression

    x = x++ + ++x;
    

    there are three attempts to modify x.

    As an example of #2, in the expression

    y = x + x++;
    

    we both use the value of x, and modify it.

    So that's the answer: make sure that in any expression you write, each variable is modified at most once, and if a variable is modified, you don't also attempt to use the value of that variable somewhere else.


    One more thing. You might be wondering how to "fix" the undefined expressions I started this answer by presenting.

    In the case of printf("%d %d %d\n", x, ++x, x++);, it's easy — just write it as three separate printf calls:

    printf("%d ", x);
    printf("%d ", ++x);
    printf("%d\n", x++);
    

    Now the behavior is perfectly well defined, and you'll get sensible results.

    In the case of x = x++ + ++x, on the other hand, there's no way to fix it. There's no way to write it so that it has guaranteed behavior matching your expectations — but that's okay, because you would never write an expression like x = x++ + ++x in a real program anyway.

    樱娆 2025-01-19 04:04:33

    C 标准规定,一个变量最多只能在两个序列点之间分配一次。例如,分号是一个序列点。
    因此,以下形式的每个语句:

    i = i++;
    i = i++ + ++i;
    

    等都违反了该规则。该标准还指出,行为是未定义的,而不是未指定的。一些编译器确实会检测到这些并产生一些结果,但这不符合标准。

    但是,两个不同的变量可以在两个序列点之间递增。

    while(*src++ = *dst++);
    

    以上是复制/分析字符串时的常见编码实践。

    The C standard says that a variable should only be assigned at most once between two sequence points. A semi-colon for instance is a sequence point.
    So every statement of the form:

    i = i++;
    i = i++ + ++i;
    

    and so on violate that rule. The standard also says that behavior is undefined and not unspecified. Some compilers do detect these and produce some result but this is not per standard.

    However, two different variables can be incremented between two sequence points.

    while(*src++ = *dst++);
    

    The above is a common coding practice while copying/analysing strings.

    旧时模样 2025-01-19 04:04:33

    https://stackoverflow.com/questions/29505280/incrementing-array-index-in-c 有人询问这样的语句:

    int k[] = {0,1,2,3,4,5,6,7,8,9,10};
    int i = 0;
    int num;
    num = k[++i+k[++i]] + k[++i];
    printf("%d", num);
    

    打印 7...OP 期望它打印 6。

    ++i 增量不能保证在其余增量之前全部完成的计算。事实上,不同的编译器在这里会得到不同的结果。在您提供的示例中,执行前 2 个 ++i,然后读取 k[] 的值,然后读取最后一个 ++i > 然后k[]

    num = k[i+1]+k[i+2] + k[i+3];
    i += 3
    

    现代编译器会对此进行很好的优化。事实上,可能比您最初编写的代码更好(假设它按照您希望的方式工作)。

    In https://stackoverflow.com/questions/29505280/incrementing-array-index-in-c someone asked about a statement like:

    int k[] = {0,1,2,3,4,5,6,7,8,9,10};
    int i = 0;
    int num;
    num = k[++i+k[++i]] + k[++i];
    printf("%d", num);
    

    which prints 7... the OP expected it to print 6.

    The ++i increments aren't guaranteed to all complete before the rest of the calculations. In fact, different compilers will get different results here. In the example you provided, the first 2 ++i executed, then the values of k[] were read, then the last ++i then k[].

    num = k[i+1]+k[i+2] + k[i+3];
    i += 3
    

    Modern compilers will optimize this very well. In fact, possibly better than the code you originally wrote (assuming it had worked the way you had hoped).

    黑色毁心梦 2025-01-19 04:04:33

    文档 n1188 来自 ISO W14 站点

    我解释一下这些想法。

    适用于这种情况的 ISO 9899 标准的主要规则是 6.5p2。

    在上一个和下一个序列点之间,对象的存储值最多应通过表达式的求值修改一次。此外,应只读先前的值以确定要存储的值。

    i=i++ 等表达式中的序列点位于 i= 之前和 i++ 之后。

    在我上面引用的论文中,解释说您可以将程序理解为由小盒子组成,每个盒子包含两个连续序列点之间的指令。序列点在标准的附录 C 中定义,在 i=i++ 的情况下,有 2 个序列点界定完整表达式。这样的表达式在语法上与 Backus-Naur 语法形式中的表达式语句条目等效(标准的附录 A 中提供了语法)。

    因此盒子内的说明顺序没有明确的顺序。

    i=i++
    

    可以解释为

    tmp = i
    i=i+1
    i = tmp
    

    或 as

    tmp = i
    i = tmp
    i=i+1
    

    因为解释代码 i=i++ 的所有这些形式都是有效的,并且因为两者生成不同的答案,所以行为是未定义的。

    因此,可以通过组成程序的每个框的开头和结尾来看到序列点[这些框是C中的原子单元],并且框内的指令顺序在所有情况下都没有定义。改变这个顺序有时会改变结果。

    编辑:

    解释此类歧义的其他好来源是来自 c-faq 网站(也发布了 作为一本书) ,即 此处此处此处

    A good explanation about what happens in this kind of computation is provided in the document n1188 from the ISO W14 site.

    I explain the ideas.

    The main rule from the standard ISO 9899 that applies in this situation is 6.5p2.

    Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

    The sequence points in an expression like i=i++ are before i= and after i++.

    In the paper that I quoted above it is explained that you can figure out the program as being formed by small boxes, each box containing the instructions between 2 consecutive sequence points. The sequence points are defined in annex C of the standard, in the case of i=i++ there are 2 sequence points that delimit a full-expression. Such an expression is syntactically equivalent with an entry of expression-statement in the Backus-Naur form of the grammar (a grammar is provided in annex A of the Standard).

    So the order of instructions inside a box has no clear order.

    i=i++
    

    can be interpreted as

    tmp = i
    i=i+1
    i = tmp
    

    or as

    tmp = i
    i = tmp
    i=i+1
    

    because both all these forms to interpret the code i=i++ are valid and because both generate different answers, the behavior is undefined.

    So a sequence point can be seen by the beginning and the end of each box that composes the program [the boxes are atomic units in C] and inside a box the order of instructions is not defined in all cases. Changing that order one can change the result sometimes.

    EDIT:

    Other good source for explaining such ambiguities are the entries from c-faq site (also published as a book) , namely here and here and here .

    路弥 2025-01-19 04:04:33

    原因是程序正在运行未定义的行为。问题在于求值顺序,因为根据 C++98 标准不需要序列点(根据 C++11 术语,没有操作在另一个操作之前或之后排序)。

    但是,如果您坚持使用一种编译器,您会发现该行为是持久的,只要您不添加函数调用或指针,这会使行为更加混乱。

    使用 Nuwen MinGW 15 GCC 7.1 您将得到:

     #include<stdio.h>
     int main(int argc, char ** argv)
     {
        int i = 0;
        i = i++ + ++i;
        printf("%d\n", i); // 2
    
        i = 1;
        i = (i++);
        printf("%d\n", i); //1
    
        volatile int u = 0;
        u = u++ + ++u;
        printf("%d\n", u); // 2
    
        u = 1;
        u = (u++);
        printf("%d\n", u); //1
    
        register int v = 0;
        v = v++ + ++v;
        printf("%d\n", v); //2
     }
    

    GCC 是如何工作的?它按从左到右的顺序计算右侧 (RHS) 的子表达式,然后将值分配给左侧 (LHS) 。这正是 Java 和 C# 的行为方式和定义其标准的方式。 (是的,Java 和 C# 中的等效软件已定义行为)。它按照从左到右的顺序逐一评估 RHS 语句中的每个子表达式;对于每个子表达式:首先计算 ++c(前增量),然后使用值 c 进行操作,然后是后增量 c++)。

    根据 GCC C++:运算符

    在 GCC C++ 中,运算符的优先级控制着中的顺序
    对各个运算符进行评估

    按照 GCC 理解的定义行为 C++ 中的等效代码进行评估:

    #include<stdio.h>
    int main(int argc, char ** argv)
    {
        int i = 0;
        //i = i++ + ++i;
        int r;
        r=i;
        i++;
        ++i;
        r+=i;
        i=r;
        printf("%d\n", i); // 2
    
        i = 1;
        //i = (i++);
        r=i;
        i++;
        i=r;
        printf("%d\n", i); // 1
    
        volatile int u = 0;
        //u = u++ + ++u;
        r=u;
        u++;
        ++u;
        r+=u;
        u=r;
        printf("%d\n", u); // 2
    
        u = 1;
        //u = (u++);
        r=u;
        u++;
        u=r;
        printf("%d\n", u); // 1
    
        register int v = 0;
        //v = v++ + ++v;
        r=v;
        v++;
        ++v;
        r+=v;
        v=r;
        printf("%d\n", v); //2
    }
    

    然后我们转到 Visual Studio 。 Visual Studio 2015,您会得到:

    #include<stdio.h>
    int main(int argc, char ** argv)
    {
        int i = 0;
        i = i++ + ++i;
        printf("%d\n", i); // 3
    
        i = 1;
        i = (i++);
        printf("%d\n", i); // 2 
    
        volatile int u = 0;
        u = u++ + ++u;
        printf("%d\n", u); // 3
    
        u = 1;
        u = (u++);
        printf("%d\n", u); // 2 
    
        register int v = 0;
        v = v++ + ++v;
        printf("%d\n", v); // 3 
    }
    

    Visual Studio 是如何工作的,它采用另一种方法,它在第一遍中评估所有预增量表达式,然后在第二遍中的操作中使用变量值,在第三遍中从 RHS 分配到 LHS,然后最后一次,它一次性计算所有后增量表达式。

    因此,Visual C++ 所理解的定义行为 C++ 中的等效项:

    #include<stdio.h>
    int main(int argc, char ** argv)
    {
        int r;
        int i = 0;
        //i = i++ + ++i;
        ++i;
        r = i + i;
        i = r;
        i++;
        printf("%d\n", i); // 3
    
        i = 1;
        //i = (i++);
        r = i;
        i = r;
        i++;
        printf("%d\n", i); // 2 
    
        volatile int u = 0;
        //u = u++ + ++u;
        ++u;
        r = u + u;
        u = r;
        u++;
        printf("%d\n", u); // 3
    
        u = 1;
        //u = (u++);
        r = u;
        u = r;
        u++;
        printf("%d\n", u); // 2 
    
        register int v = 0;
        //v = v++ + ++v;
        ++v;
        r = v + v;
        v = r;
        v++;
        printf("%d\n", v); // 3 
    }
    

    如 Visual Studio 文档所述 评估的优先级和顺序

    当多个运算符一起出现时,它们具有相同的优先级,并根据它们的结合性进行计算。表中的运算符在以 Postfix 运算符开头的部分中进行了描述。

    The reason is that the program is running undefined behavior. The problem lies in the evaluation order, because there is no sequence points required according to C++98 standard ( no operations is sequenced before or after another according to C++11 terminology).

    However if you stick to one compiler, you will find the behavior persistent, as long as you don't add function calls or pointers, which would make the behavior more messy.

    Using Nuwen MinGW 15 GCC 7.1 you will get:

     #include<stdio.h>
     int main(int argc, char ** argv)
     {
        int i = 0;
        i = i++ + ++i;
        printf("%d\n", i); // 2
    
        i = 1;
        i = (i++);
        printf("%d\n", i); //1
    
        volatile int u = 0;
        u = u++ + ++u;
        printf("%d\n", u); // 2
    
        u = 1;
        u = (u++);
        printf("%d\n", u); //1
    
        register int v = 0;
        v = v++ + ++v;
        printf("%d\n", v); //2
     }
    

    How does GCC work? it evaluates sub expressions at a left to right order for the right hand side (RHS) , then assigns the value to the left hand side (LHS) . This is exactly how Java and C# behave and define their standards. (Yes, the equivalent software in Java and C# has defined behaviors). It evaluate each sub expression one by one in the RHS Statement in a left to right order; for each sub expression: the ++c (pre-increment) is evaluated first then the value c is used for the operation, then the post increment c++).

    according to GCC C++: Operators

    In GCC C++, the precedence of the operators controls the order in
    which the individual operators are evaluated

    the equivalent code in defined behavior C++ as GCC understands:

    #include<stdio.h>
    int main(int argc, char ** argv)
    {
        int i = 0;
        //i = i++ + ++i;
        int r;
        r=i;
        i++;
        ++i;
        r+=i;
        i=r;
        printf("%d\n", i); // 2
    
        i = 1;
        //i = (i++);
        r=i;
        i++;
        i=r;
        printf("%d\n", i); // 1
    
        volatile int u = 0;
        //u = u++ + ++u;
        r=u;
        u++;
        ++u;
        r+=u;
        u=r;
        printf("%d\n", u); // 2
    
        u = 1;
        //u = (u++);
        r=u;
        u++;
        u=r;
        printf("%d\n", u); // 1
    
        register int v = 0;
        //v = v++ + ++v;
        r=v;
        v++;
        ++v;
        r+=v;
        v=r;
        printf("%d\n", v); //2
    }
    

    Then we go to Visual Studio. Visual Studio 2015, you get:

    #include<stdio.h>
    int main(int argc, char ** argv)
    {
        int i = 0;
        i = i++ + ++i;
        printf("%d\n", i); // 3
    
        i = 1;
        i = (i++);
        printf("%d\n", i); // 2 
    
        volatile int u = 0;
        u = u++ + ++u;
        printf("%d\n", u); // 3
    
        u = 1;
        u = (u++);
        printf("%d\n", u); // 2 
    
        register int v = 0;
        v = v++ + ++v;
        printf("%d\n", v); // 3 
    }
    

    How does Visual Studio work, it takes another approach, it evaluates all pre-increments expressions in first pass, then uses variables values in the operations in second pass, assign from RHS to LHS in third pass, then at last pass it evaluates all the post-increment expressions in one pass.

    So the equivalent in defined behavior C++ as Visual C++ understands:

    #include<stdio.h>
    int main(int argc, char ** argv)
    {
        int r;
        int i = 0;
        //i = i++ + ++i;
        ++i;
        r = i + i;
        i = r;
        i++;
        printf("%d\n", i); // 3
    
        i = 1;
        //i = (i++);
        r = i;
        i = r;
        i++;
        printf("%d\n", i); // 2 
    
        volatile int u = 0;
        //u = u++ + ++u;
        ++u;
        r = u + u;
        u = r;
        u++;
        printf("%d\n", u); // 3
    
        u = 1;
        //u = (u++);
        r = u;
        u = r;
        u++;
        printf("%d\n", u); // 2 
    
        register int v = 0;
        //v = v++ + ++v;
        ++v;
        r = v + v;
        v = r;
        v++;
        printf("%d\n", v); // 3 
    }
    

    as Visual Studio documentation states at Precedence and Order of Evaluation:

    Where several operators appear together, they have equal precedence and are evaluated according to their associativity. The operators in the table are described in the sections beginning with Postfix Operators.

    静待花开 2025-01-19 04:04:33

    理解这一点的关键是表达式i++i,它的作用是给i加1 code> (即,将值 i+1 存储在变量 i 中),但这并不意味着在确定值时就会进行存储。

    在像 i++ + ++i 这样的表达式中,加法左侧的值为 i,右侧为 i+1

    但是,当任何一方的效果发生时,它都是未定义的,因此整个表达式(i++ + ++i)的值是未定义的。对 i 的第一次引用是否会使用当前语句之前的 i 值或右侧效果之后的值(未确认执行顺序),反之亦然对 i 的第二个引用是否会使用第一个引用效果后的值。 C 标准明确指出它是未定义的,定义它会将优化器限制为特定的执行顺序,这是没有帮助的。

    对于编译器来说,注意到最终效果是将 i 增加 2 并求值(等于 i+i+1 及以后的值)是完全合理的(并且可能是有效的)将 i+2 存储在 i 中,或者不这样做,

    您不应该尝试找出编译器的功能并对其进行

    更改 。设置,显然(对你来说!)与周围代码无关的更改或者新版本的编译器都可能会改变行为。

    您可能会遇到一种最耗时的错误,这些错误会在明显未更改的代码中突然出现

    (例如 2*i+1; 它对所有人类读者都有明显且有保证的含义。

    i+=2;)并意识到所有现代商业编译器(当优化开启时)都会将其转换为适合您的平台的最有效的代码,并且我什至建议 除了独立和之外,切勿在任何其他表达式中使用 ++那只是因为它很容易阅读。不要以为它比 i=i+1 更高效,因为所有现代商业编译器都会为两者生成相同的代码。他们并不愚蠢。

    The key to understanding this is that the value of the expression i++ is i and it's effect is to add 1 to i (i.e. store the value i+1 in the variable i) but that does not mean that the store will take place when the value is determined.

    In an expression like i++ + ++i the value of the left-hand-side of the addition is i and right-hand-side is i+1.

    But it's undefined when the effect of either side takes place so undefined what the value of the whole expression (i++ + ++i). Will the first reference to i use the value of i before the current statement or the one after the effect of right hand side (no order of execution is confirmed) or vice versa will the second reference to i use the value after the effect of the first one. The C Standard specifically states that is undefined and defining it constrains the optimiser to a specific order of execution which is unhelpful.

    It's perfectly reasonable (and possibly efficient) for a compilter to notice that the net effect is to increment i by 2 and evaluate (what amounts to i+i+1 and later store i+2 in i, or not do that.

    What you should not do is try and work out what your compiler does and play to it.

    Changes to the compiler optimisation settings, apparently (to you!) unrelated changes to the surrounding code or new releases of the compiler could all change the behaviour.

    You lay yourself open to one of the most time consuming kinds of bug that suddenly arise in apparently unchanged code.

    Write the code you need (e.g. 2*i+1; i+=2;) and realise that all modern commercial compilers will (when optimisation is on) translate that into the most efficient code for your platform and that it has an obvious and guaranteed meaning to all human readers.

    I even suggest never using ++ in any other expression than standalone and then only because it's easy to read. Don't imagine it's somehow more efficient than i=i+1 because all modern commercial compilers will emit the same code for both. They ain't daft.

    ~没有更多了~
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文