为什么这些构造使用未定义前后的不确定行为?
#include <stdio.h>
int main(void)
{
int i = 0;
i = i++ + ++i;
printf("%d\n", i); // 3
i = 1;
i = (i++);
printf("%d\n", i); // 2 Should be 1, no ?
volatile int u = 0;
u = u++ + ++u;
printf("%d\n", u); // 1
u = 1;
u = (u++);
printf("%d\n", u); // 2 Should also be one, no ?
register int v = 0;
v = v++ + ++v;
printf("%d\n", v); // 3 (Should be the same as u ?)
int w = 0;
printf("%d %d\n", ++w, w); // shouldn't this print 1 1
int x[2] = { 5, 8 }, y = 0;
x[y] = y ++;
printf("%d %d\n", x[0], x[1]); // shouldn't this print 0 8? or 5 0?
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
C具有未定义的行为的概念,即某些语言构造在语法上是有效的,但是在运行代码时您无法预测行为。
据我所知,标准并未明确说出为什么存在不确定行为的概念。在我看来,这仅仅是因为语言设计师希望在语义上有一些余地,而不是要求所有实现都以完全相同的方式处理整数溢出,这很可能会施加严重的性能成本,而是离开了行为未定义,这样,如果您编写导致整数溢出的代码,可能会发生任何事情。
因此,考虑到这一点,为什么这些“问题”?该语言清楚地说,某些事情导致不确定的行为。没有问题,不涉及“应该”。如果在声明一个涉及变量之一
volatile
时,未定义的行为会发生变化,则不会证明或更改任何内容。它是不确定的;您无法推论这种行为。您最有趣的示例,其中一个
是一个不确定行为的教科书示例(请参见“ noreferrer”>序列点)。
C has the concept of undefined behavior, i.e. some language constructs are syntactically valid but you can't predict the behavior when the code is run.
As far as I know, the standard doesn't explicitly say why the concept of undefined behavior exists. In my mind, it's simply because the language designers wanted there to be some leeway in the semantics, instead of i.e. requiring that all implementations handle integer overflow in the exact same way, which would very likely impose serious performance costs, they just left the behavior undefined so that if you write code that causes integer overflow, anything can happen.
So, with that in mind, why are these "issues"? The language clearly says that certain things lead to undefined behavior. There is no problem, there is no "should" involved. If the undefined behavior changes when one of the involved variables is declared
volatile
, that doesn't prove or change anything. It is undefined; you cannot reason about the behavior.Your most interesting-looking example, the one with
is a text-book example of undefined behavior (see Wikipedia's entry on sequence points).
此处引用的大多数答案都强调了这些结构的行为是不确定的。要了解为什么这些构造的行为不确定,让我们首先根据C11标准来理解这些术语:
测序:(5.1.2.3)
未序列:
评估可以是两件事之一:
序列点:
现在提出问题,因为
标准之类的表达方式说:
6.5表达式:
因此,上述表达式调用UB,因为对同一对象的两个副作用
i
相对于彼此而言是未序列的。这意味着如果通过i
将副作用在副作用之前或之后通过++
。取决于分配是在增量之前还是之后发生的,将产生不同的结果,这是不确定行为的情况之一。
让我们重命名
i
在分配的左侧为il
,在分配右边(在Expressioni ++
中)为ir ,然后表达式就像
一个重要的点关于Postfix
+++
操作员是:这意味着可以评估表达式
il = ir ++
可以作为或
产生两个不同的结果
1
和2
,取决于副作用的顺序通过分配和++
,因此调用了未定义的行为。Most of the answers here quoted from C standard emphasizing that the behaviour of these constructs are undefined. To understand why the behaviour of these constructs are undefined, let's understand these terms first in the light of C11 standard:
Sequenced: (5.1.2.3)
Unsequenced:
Evaluations can be one of two things:
Sequence Point:
Now coming to the question, for the expressions like
standard says that:
6.5 Expressions:
Therefore, the above expression invokes UB because two side effects on the same object
i
is unsequenced relative to each other. That means it is not sequenced whether the side effect by assignment toi
will be done before or after the side effect by++
.Depending on whether assignment occurs before or after the increment, different results will be produced and that's the one of the case of undefined behaviour.
Lets rename the
i
at left of assignment beil
and at the right of assignment (in the expressioni++
) beir
, then the expression be likeAn important point regarding Postfix
++
operator is that:It means the expression
il = ir++
could be evaluated either asor
resulting in two different results
1
and2
which depends on the sequence of side effects by assignment and++
and hence invokes undefined behaviour.我认为C99标准的相关部分是6.5表达式,§2
和6.5.16分配运营商,§4:
I think the relevant parts of the C99 standard are 6.5 Expressions, §2
and 6.5.16 Assignment operators, §4:
只需编译并拆卸您的代码线,如果您倾向于知道自己到底有多了解。
这就是我在计算机上得到的,以及我认为正在发生的事情:(
我...假设0x00000014指令是某种编译器优化吗?)
Just compile and disassemble your line of code, if you are so inclined to know how exactly it is you get what you are getting.
This is what I get on my machine, together with what I think is going on:
(I... suppose that the 0x00000014 instruction was some kind of compiler optimization?)
行为无法真正解释,因为它同时调用未指定的行为和不确定的行为,因此我们将无法对此代码做出任何一般预测,尽管如果您阅读了 olve olve olve maudal 工作,例如 deep c and
因此,继续进行未指定的行为,在 C99标准草案节
6.5
段因此,当我们有这样的行时:
我们不知道
i ++
还是++ i
将首先进行评估。这主要是为了给编译器更好的优化选项。由于程序正在修改变量(
i
,u
等),因此我们在这里也有 。 href =“ http://en.wikipedia.org/wiki/sequence_point” rel =“ noreferrer”>序列点。从标准草案6.5
段 2 (强调我的)它引用了以下代码示例未定义:
在所有这些示例中,代码正在尝试以同一序列进行多次修改对象,该序列将以
;
在这些情况下以这些序列结尾。 :未指定的行为在 C99标准草案在
3.4.4
中和未定义的行为在
3.4.3
中定义了:,并指出:
The behavior can't really be explained because it invokes both unspecified behavior and undefined behavior, so we can not make any general predictions about this code, although if you read Olve Maudal's work such as Deep C and Unspecified and Undefined sometimes you can make good guesses in very specific cases with a specific compiler and environment but please don't do that anywhere near production.
So moving on to unspecified behavior, in draft c99 standard section
6.5
paragraph 3 says(emphasis mine):So when we have a line like this:
we do not know whether
i++
or++i
will be evaluated first. This is mainly to give the compiler better options for optimization.We also have undefined behavior here as well since the program is modifying variables(
i
,u
, etc..) more than once between sequence points. From draft standard section6.5
paragraph 2(emphasis mine):it cites the following code examples as being undefined:
In all these examples the code is attempting to modify an object more than once in the same sequence point, which will end with the
;
in each one of these cases:Unspecified behavior is defined in the draft c99 standard in section
3.4.4
as:and undefined behavior is defined in section
3.4.3
as:and notes that:
另一种回答这一点的方法,而不是陷入序列点和未定义行为的神秘细节,只是问,他们应该是什么意思? 程序员试图做什么?
没有人会在真实的程序中写下它,这并不明显,它的作用并不能想到有人可能一直试图编码会导致这种特殊的操作顺序。而且,由于对您和我来说并不明显,如果编译器不知道应该做什么,这在我的书中很好。
第二个片段
i = i ++
,更容易理解。看起来有人正在尝试增加i
,并将结果分配回i
。但是有几种方法可以在C中执行此操作。在几乎任何编程语言中相同:c当然都有一个方便的快捷方式:
这也意味着:“取
i
的值,添加1,然后将结果分配回i “。
的值,则构造了两者的杂物。因此,如果我们通过编写我们真正说的是“取
i
的值,则添加1,将结果分配给i
,将结果分配给 并将结果分配回i
”。我们感到困惑,因此,如果编译器也感到困惑,这不会打扰我太多。实际上,这些疯狂表达式唯一写的是人们将它们用作人工示例
++
应该如何工作的时候。当然,重要的是要了解++
的工作原理。但是,使用++
的一个实际规则是,“如果不明显使用++
的表达式是什么意思,请不要写。”我们曾经在comp.lang.c上花费无数小时讨论这样的表情,而为什么它们不确定。我的两个更长的答案,试图真正解释原因,在网上存档:
? > c FAQ列表。
Another way of answering this, rather than getting bogged down in arcane details of sequence points and undefined behavior, is simply to ask, what are they supposed to mean? What was the programmer trying to do?
The first fragment asked about,
i = i++ + ++i
, is pretty clearly insane in my book. No one would ever write it in a real program, it's not obvious what it does, there's no conceivable algorithm someone could have been trying to code that would have resulted in this particular contrived sequence of operations. And since it's not obvious to you and me what it's supposed to do, it's fine in my book if the compiler can't figure out what it's supposed to do, either.The second fragment,
i = i++
, is a little easier to understand. It looks like someone is trying to incrementi
, and assign the result back toi
. But there are a couple ways of doing this in C. The most basic way to takei
's value, add 1, and assign the result back toi
, is the same in almost any programming language:C, of course, has a handy shortcut:
This also means, "take
i
's value, add 1, and assign the result back toi
". So if we construct a hodgepodge of the two, by writingwhat we're really saying is "take
i
's value, add 1, assign the result back toi
, and assign the result back toi
". We're confused, so it doesn't bother me too much if the compiler gets confused, too.Realistically, the only time these crazy expressions get written is when people are using them as artificial examples of how
++
is supposed to work. And of course it is important to understand how++
works. But one practical rule for using++
is, "If it's not obvious what an expression using++
means, don't write it."We used to spend countless hours on comp.lang.c discussing expressions like these and why they're undefined. Two of my longer answers, that try to really explain why, are archived on the web:
See also question 3.8 and the rest of the questions in section 3 of the C FAQ list.
通常,这个问题链接为与代码或类似变体有关的问题的
重复
。
虽然这也是 不确定的行为 语句进行比较时涉及:
当
printf()
与以下评估顺序
printf()
中的参数是 unsecified 。这意味着,可以按任何顺序评估表达式I ++
++ i 。 c11标准对此具有一些相关的描述:附件J,未注册行为
3.4.4,未指定的行为
未指定的行为本身不是问题。考虑此示例:
这也具有未指定的行为,因为
++ X
和y ++
的评估顺序未指定。但这是完全合法和有效的陈述。在此语句中,没有 不确定的行为。因为修改(++ X
和y ++
)已完成 dintistion 对象。以下语句
以 不确定的行为呈现的是,这两个表达式修改了相同对象
i
而无需中间 序列点 。另一个细节是printf()调用中涉及的 comma 是 saparator ,而不是 逗号操作员 。
这是一个重要的区别,因为 comma运算符确实在对操作数的评估之间介绍了序列点,这使得以下合法:
逗号操作员左右评估其操作数 - 权利和仅产生上一部操作数的价值。因此,在
j =(++ i,i ++);
,++ i
增量代码> i ++ 产生i
的旧值(6
),该值分配给J
。然后i
由于插入后,变为 7 。因此,如果函数调用中的 comma 是逗号操作员,那么
将不是问题。但是它调用了不确定的行为,因为 comma 这是 saparator 。
对于那些不确定的行为新手的人,将受益于阅读每个C程序员都应该了解有关不确定的行为的知识以了解C中的概念和许多其他未定义行为的变体
。 /A/4105123/1275169“>未定义,未指定和实现定义的行为也很重要。
Often this question is linked as a duplicate of questions related to code like
or
or similar variants.
While this is also undefined behaviour as stated already, there are subtle differences when
printf()
is involved when comparing to a statement such as:In the following statement:
the order of evaluation of arguments in
printf()
is unspecified. That means, expressionsi++
and++i
could be evaluated in any order. C11 standard has some relevant descriptions on this:Annex J, unspecified behaviours
3.4.4, unspecified behavior
The unspecified behaviour itself is NOT an issue. Consider this example:
This too has unspecified behaviour because the order of evaluation of
++x
andy++
is unspecified. But it's perfectly legal and valid statement. There's no undefined behaviour in this statement. Because the modifications (++x
andy++
) are done to distinct objects.What renders the following statement
as undefined behaviour is the fact that these two expressions modify the same object
i
without an intervening sequence point.Another detail is that the comma involved in the printf() call is a separator, not the comma operator.
This is an important distinction because the comma operator does introduce a sequence point between the evaluation of their operands, which makes the following legal:
The comma operator evaluates its operands left-to-right and yields only the value of the last operand. So in
j = (++i, i++);
,++i
incrementsi
to6
andi++
yields old value ofi
(6
) which is assigned toj
. Theni
becomes7
due to post-increment.So if the comma in the function call were to be a comma operator then
will not be a problem. But it invokes undefined behaviour because the comma here is a separator.
For those who are new to undefined behaviour would benefit from reading What Every C Programmer Should Know About Undefined Behavior to understand the concept and many other variants of undefined behaviour in C.
This post: Undefined, unspecified and implementation-defined behavior is also relevant.
虽然任何编译器和处理器实际上不太可能这样做,但根据C标准,编译器以序列实现“ I ++”是合法的:
虽然我认为任何处理器都不支持硬件来允许这样的硬件有效地完成的一件事,可以轻松地想象这种行为会使多线程代码更容易的情况(例如,如果两个线程尝试同时执行上述序列,则可以保证,
i
会增加到两个),并且某些未来的处理器可能会提供类似的功能并不完全不可思议。如果编译器要编写
i ++
,如上所述(根据标准合法),并在整个总体表达式(也是合法)的整个评估中插入上述说明,并且如果不是碰巧请注意,其他指令之一碰巧访问i
,编译器可以生成一系列僵局的指令。可以肯定的是,在两个地方使用相同变量i
的情况下,编译器几乎可以肯定会检测到问题和q
,并使用(*p)
和(*q)
在上面的表达式中(而不是使用i
两次)不需要编译器识别或避免如果同一个对象的地址都通过p
和q
传递的僵局。While it is unlikely that any compilers and processors would actually do so, it would be legal, under the C standard, for the compiler to implement "i++" with the sequence:
While I don't think any processors support the hardware to allow such a thing to be done efficiently, one can easily imagine situations where such behavior would make multi-threaded code easier (e.g. it would guarantee that if two threads try to perform the above sequence simultaneously,
i
would get incremented by two) and it's not totally inconceivable that some future processor might provide a feature something like that.If the compiler were to write
i++
as indicated above (legal under the standard) and were to intersperse the above instructions throughout the evaluation of the overall expression (also legal), and if it didn't happen to notice that one of the other instructions happened to accessi
, it would be possible (and legal) for the compiler to generate a sequence of instructions that would deadlock. To be sure, a compiler would almost certainly detect the problem in the case where the same variablei
is used in both places, but if a routine accepts references to two pointersp
andq
, and uses(*p)
and(*q)
in the above expression (rather than usingi
twice) the compiler would not be required to recognize or avoid the deadlock that would occur if the same object's address were passed for bothp
andq
.虽然表达式的语法如
a = a ++
或a +++a+a ++
是合法的,但这些构造的行为 IS 不确定,因为不遵守C标准中的 应 。 c99 6.5p2 :用脚注73 进一步澄清
各种序列点在 c11 (和):
相同段
您可以通过例如使用GCC使用GCC的最新版本的GCC来检测程序中的此类错误
-wall
和-werror
,然后GCC将直接拒绝编译您的程序。以下是GCC的输出(Ubuntu 6.2.0-5ubuntu12)6.2.0 20161005:重要的部分是知道序列点是什么 - 什么是序列点,什么不是不是 。例如, comma运算符是一个序列,因此
定义明确,并且会递增
i
一个,产生旧值,丢弃该值;然后在逗号操作员,解决副作用;然后通过一个增量i
,结果值成为表达式的值 - ie这只是一个人为编写j =(i += 2)
when但是,这是一种写作的“巧妙”方法,但是,,函数参数列表中的是不是逗号运算符,并且在不同参数的评估之间没有序列。相反,他们的评估是相互序列的;因此,函数调用
具有 不确定的行为,因为
i ++
和++ i
之间没有序列点。 /strong>,因此i
的值两次都通过i ++
和++ i
,在上一个序列和下一个序列之间进行了修改。观点。While the syntax of the expressions like
a = a++
ora++ + a++
is legal, the behaviour of these constructs is undefined because a shall in C standard is not obeyed. C99 6.5p2:With footnote 73 further clarifying that
The various sequence points are listed in Annex C of C11 (and C99):
The wording of the same paragraph in C11 is:
You can detect such errors in a program by for example using a recent version of GCC with
-Wall
and-Werror
, and then GCC will outright refuse to compile your program. The following is the output of gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005:The important part is to know what a sequence point is -- and what is a sequence point and what isn't. For example the comma operator is a sequence point, so
is well-defined, and will increment
i
by one, yielding the old value, discard that value; then at comma operator, settle the side effects; and then incrementi
by one, and the resulting value becomes the value of the expression - i.e. this is just a contrived way to writej = (i += 2)
which is yet again a "clever" way to writeHowever, the
,
in function argument lists is not a comma operator, and there is no sequence point between evaluations of distinct arguments; instead their evaluations are unsequenced with regard to each other; so the function callhas undefined behaviour because there is no sequence point between the evaluations of
i++
and++i
in function arguments, and the value ofi
is therefore modified twice, by bothi++
and++i
, between the previous and the next sequence point.您的问题可能不是:“为什么这些构造在C中不确定的行为?”。您的问题可能是:“为什么此代码(使用
++
)不给我我期望的价值?”,有人将您的问题标记为重复,然后将您发送给您。这个答案试图回答这个问题:为什么您的代码不给您期望的答案,以及如何学会识别(并避免)无法正常工作的表达式。
我假设您已经听到了C的
++
和-
运算符的基本定义,以及前缀表单++ x
的前缀如何有所不同。从后缀表单x ++
。但是这些操作员很难考虑,因此,为了确保您理解,也许您写了一个小小的测试程序,涉及类似的内容,但令人惊讶的是,该程序确实 not> 帮助您理解 - 它印刷了一些奇怪的,莫名其妙的输出表明,也许
++
可以做一些完全不同的事情,而不是您认为的事情。或者,也许您正在研究一种难以理解的表达,就像
有人给您那个代码的难题。此代码也没有意义,尤其是如果您运行它 - 如果您在两个不同的编译器下进行编译并运行它,则您可能会得到两个不同的答案!那怎么了?哪个答案正确? (答案是他们俩都是,或者它们都不是。)
正如您现在所听到的,这些表达式是 ,这意味着C语言无法保证它们的内容'LL做。这是一个奇怪而令人不安的结果,因为您可能认为您可以编写并运行的任何程序都会产生独特的,定义明确的输出。但是在不确定的行为的情况下,事实并非如此。
是什么使表达不确定?涉及
++
和的表达方式 -
总是不确定吗?当然不是:这些是有用的操作员,如果正确使用它们,则它们定义得很好。对于我们正在谈论的表达式,使它们不确定的是什么时候发生太多事情,当我们无法分辨出什么顺序会发生什么,但是当顺序与结果至关重要时。
让我们回到我在此答案中使用的两个示例。当我写
这个问题时,在实际调用
printf
之前,编译器首先计算x
的值,或x ++
,或者也许++ x
?但事实证明我们不知道。 C中没有规则说,函数的论点会从左到右评估,左右或以其他顺序进行评估。因此,我们不能说编译器是否会首先进行X
,然后++ X
,然后X ++
,或X ++ 然后
++ X
然后X
或其他一些顺序。但是该顺序显然很重要,因为根据编译器使用的顺序,我们显然会打印出不同的数字。那疯狂的表情呢?
此表达式的问题在于它包含三种不同的尝试来修改
x
的值:(1)x ++
part零件试图服用x
的值,添加1,将新值存储在x
中,然后返回旧值; (2)++ X
部分试图采取X
的值,添加1,将新值存储在x
中,然后返回新价值; (3)x =
部分试图将其他两个的总和分配回x
。这三个尝试任务中的哪个将“获胜”?这三个值中的哪个实际上将确定x
的最终值?再次,也许令人惊讶的是,C中没有规则可以告诉我们。您可能会想象,优先级或关联或从左到右的评估会告诉您发生了什么顺序,但事实并非如此。您可能不相信我,但请言语,我会再说一遍:优先级和关联性并不能确定表达式评估顺序的各个方面。我们试图将新值分配给
x
,优先级和关联性 not 的不同点告诉我们,这些尝试首先发生,最后或任何事物。因此,如果您想确保所有程序都明确定义,您可以编写哪些表达方式,以及哪些表达式以及您不能写的内容?
这些表达式都很好:
这些表达式都是不确定的:
最后一个问题是,您如何确定哪些表达式定义明确,哪些表达式不确定?
正如我之前说的那样,未定义的表达式是一次发生太多事情的表达式,您无法确定您的顺序发生了什么,以及顺序很重要的地方:
作为#1的示例,在表达式中,
有三次修改
x
的尝试。作为#2的一个示例,在表达式中,
我们都使用
x
的值,然后对其进行修改。这就是答案:确保在您编写的任何表达式中,每个变量最多都会一次修改,并且如果修改变量,您也不会尝试使用该变量的其他地方的值。
还有一件事。您可能想知道如何“修复”我通过介绍开始这个答案的未定义表达式。
对于
printf(“%d%d%d \ n”,x,++ x,x ++);
,这很容易 - 只需将其写入三个单独的printf
呼叫:现在的行为已完全定义,您将获得明智的结果。
另一方面,对于
x = x +++++ x
,无法修复它。无法编写它,以确保与您的期望相匹配的行为 - 但这没关系,因为无论如何,您永远都不会在真实的程序中写出x = x = x+++++++x
的表达式。Your question was probably not, "Why are these constructs undefined behavior in C?". Your question was probably, "Why did this code (using
++
) not give me the value I expected?", and someone marked your question as a duplicate, and sent you here.This answer tries to answer that question: why did your code not give you the answer you expected, and how can you learn to recognize (and avoid) expressions that will not work as expected.
I assume you've heard the basic definition of C's
++
and--
operators by now, and how the prefix form++x
differs from the postfix formx++
. But these operators are hard to think about, so to make sure you understood, perhaps you wrote a tiny little test program involving something likeBut, to your surprise, this program did not help you understand — it printed some strange, inexplicable output, suggesting that maybe
++
does something completely different, not at all what you thought it did.Or, perhaps you're looking at a hard-to-understand expression like
Perhaps someone gave you that code as a puzzle. This code also makes no sense, especially if you run it — and if you compile and run it under two different compilers, you're likely to get two different answers! What's up with that? Which answer is correct? (And the answer is that both of them are, or neither of them are.)
As you've heard by now, these expressions are undefined, which means that the C language makes no guarantee about what they'll do. This is a strange and unsettling result, because you probably thought that any program you could write, as long as it compiled and ran, would generate a unique, well-defined output. But in the case of undefined behavior, that's not so.
What makes an expression undefined? Are expressions involving
++
and--
always undefined? Of course not: these are useful operators, and if you use them properly, they're perfectly well-defined.For the expressions we're talking about, what makes them undefined is when there's too much going on at once, when we can't tell what order things will happen in, but when the order matters to the result we'll get.
Let's go back to the two examples I've used in this answer. When I wrote
the question is, before actually calling
printf
, does the compiler compute the value ofx
first, orx++
, or maybe++x
? But it turns out we don't know. There's no rule in C which says that the arguments to a function get evaluated left-to-right, or right-to-left, or in some other order. So we can't say whether the compiler will dox
first, then++x
, thenx++
, orx++
then++x
thenx
, or some other order. But the order clearly matters, because depending on which order the compiler uses, we'll clearly get a different series of numbers printed out.What about this crazy expression?
The problem with this expression is that it contains three different attempts to modify the value of
x
: (1) thex++
part tries to takex
's value, add 1, store the new value inx
, and return the old value; (2) the++x
part tries to takex
's value, add 1, store the new value inx
, and return the new value; and (3) thex =
part tries to assign the sum of the other two back tox
. Which of those three attempted assignments will "win"? Which of the three values will actually determine the final value ofx
? Again, and perhaps surprisingly, there's no rule in C to tell us.You might imagine that precedence or associativity or left-to-right evaluation tells you what order things happen in, but they do not. You may not believe me, but please take my word for it, and I'll say it again: precedence and associativity do not determine every aspect of the evaluation order of an expression in C. In particular, if within one expression there are multiple different spots where we try to assign a new value to something like
x
, precedence and associativity do not tell us which of those attempts happens first, or last, or anything.So with all that background and introduction out of the way, if you want to make sure that all your programs are well-defined, which expressions can you write, and which ones can you not write?
These expressions are all fine:
These expressions are all undefined:
And the last question is, how can you tell which expressions are well-defined, and which expressions are undefined?
As I said earlier, the undefined expressions are the ones where there's too much going at once, where you can't be sure what order things happen in, and where the order matters:
As an example of #1, in the expression
there are three attempts to modify
x
.As an example of #2, in the expression
we both use the value of
x
, and modify it.So that's the answer: make sure that in any expression you write, each variable is modified at most once, and if a variable is modified, you don't also attempt to use the value of that variable somewhere else.
One more thing. You might be wondering how to "fix" the undefined expressions I started this answer by presenting.
In the case of
printf("%d %d %d\n", x, ++x, x++);
, it's easy — just write it as three separateprintf
calls:Now the behavior is perfectly well defined, and you'll get sensible results.
In the case of
x = x++ + ++x
, on the other hand, there's no way to fix it. There's no way to write it so that it has guaranteed behavior matching your expectations — but that's okay, because you would never write an expression likex = x++ + ++x
in a real program anyway.C标准说,最多应在两个序列点之间最多分配一个变量。例如,半颜色是一个序列。
因此,表格的每个陈述:
等等,违反了该规则。该标准还说,行为是未定义的,没有指定。一些编译器确实检测到这些并产生一些结果,但这不是根据标准。
但是,可以在两个序列点之间增加两个不同的变量。
以上是复制/分析字符串的常见编码实践。
The C standard says that a variable should only be assigned at most once between two sequence points. A semi-colon for instance is a sequence point.
So every statement of the form:
and so on violate that rule. The standard also says that behavior is undefined and not unspecified. Some compilers do detect these and produce some result but this is not per standard.
However, two different variables can be incremented between two sequence points.
The above is a common coding practice while copying/analysing strings.
在 https://stackoverflow.com/questions/29505280/29505280/incrementing-arreay-arrey-ray-incray-index-in-c-in-c-in-inc-in-c-in-c-in-inc-inc.-index-inc-in-c-inc. 有人询问了类似的陈述:
哪个打印7 ... OP期望它打印6。
++ i
增量不能保证在其余计算之前完成所有内容。实际上,不同的编译器将在这里获得不同的结果。在您提供的示例中,执行了第一个2++ i
,然后读取k []
的值,然后是最后的++ i
然后k []
。现代编译器将很好地优化它。实际上,可能比您最初编写的代码更好(假设它按照您希望的方式起作用)。
In https://stackoverflow.com/questions/29505280/incrementing-array-index-in-c someone asked about a statement like:
which prints 7... the OP expected it to print 6.
The
++i
increments aren't guaranteed to all complete before the rest of the calculations. In fact, different compilers will get different results here. In the example you provided, the first 2++i
executed, then the values ofk[]
were read, then the last++i
thenk[]
.Modern compilers will optimize this very well. In fact, possibly better than the code you originally wrote (assuming it had worked the way you had hoped).
文档中提供了关于这种计算中发生的情况的一个很好的解释noreferrer“> n1188 来自 >。
我解释了这些想法。
在这种情况下适用的标准ISO 9899的主要规则是6.5p2。
i = i ++
之类的表达式中的序列点在i =
之前和i ++
之前。在我上面引用的论文中,您可以确定该程序是由小框形成的,每个框中包含连续2个序列之间的指令。在
i = i = i ++
的情况下,序列点是在标准的附件C中定义的,有2个序列点可以界定全表达。这种表达式在句法上等效于表达式词的条目
以语法的backus-naur形式(标准的附件A中提供了语法)。因此,盒子内的说明顺序没有明确的顺序。
可以将其解释为
也
可以解释,因为所有这些形式都可以解释代码
i = i ++
都是有效的,并且因为两者都会产生不同的答案,所以行为不确定。因此,序列点可以通过开始,每个组成程序的框的末端(盒子是C中的原子单元),并且在任何情况下均未定义说明顺序。更改该订单有时会更改结果。
编辑:
解释此类歧义的其他良好来源是 c-faq site(也出版了作为一本书),即在这里和在这里 and 在这里。
A good explanation about what happens in this kind of computation is provided in the document n1188 from the ISO W14 site.
I explain the ideas.
The main rule from the standard ISO 9899 that applies in this situation is 6.5p2.
The sequence points in an expression like
i=i++
are beforei=
and afteri++
.In the paper that I quoted above it is explained that you can figure out the program as being formed by small boxes, each box containing the instructions between 2 consecutive sequence points. The sequence points are defined in annex C of the standard, in the case of
i=i++
there are 2 sequence points that delimit a full-expression. Such an expression is syntactically equivalent with an entry ofexpression-statement
in the Backus-Naur form of the grammar (a grammar is provided in annex A of the Standard).So the order of instructions inside a box has no clear order.
can be interpreted as
or as
because both all these forms to interpret the code
i=i++
are valid and because both generate different answers, the behavior is undefined.So a sequence point can be seen by the beginning and the end of each box that composes the program [the boxes are atomic units in C] and inside a box the order of instructions is not defined in all cases. Changing that order one can change the result sometimes.
EDIT:
Other good source for explaining such ambiguities are the entries from c-faq site (also published as a book) , namely here and here and here .
原因是该程序正在运行未定义的行为。问题在于评估顺序,因为根据C ++ 98标准不需要序列点(根据C ++ 11术语,在另一个术语之前或之后未对操作进行测序)。
但是,如果您坚持一个编译器,只要您不添加函数呼叫或指示,就会发现行为持续存在,这会使行为更加混乱。
使用 nuwen mingw 15 GCC 7.1您会得到:
GCC如何工作?它在右侧(RHS)的左至右顺序评估子表达式,然后将值分配给左侧(LHS)。这正是Java和C#表现并定义其标准的方式。 (是的,Java和C#中的等效软件具有定义的行为)。它在RHS语句中以左至右顺序评估每个子表达式;对于每个子表达式:首先评估++ c(预先提交),然后将值C用于操作,然后将邮政为增量C ++)。
根据 gcc c ++:操作员
:
然后,我们转到
视觉工作室如何工作,它采用另一种方法,它评估了第一次通过的所有预插入表达式,然后在第二次通过的操作中使用变量值,然后在第三次通过时从RHS到LHS,然后最后,它在一个通过中评估了所有插入后表达式。
因此,如视觉C ++所理解的定义行为C ++的等效词:
作为Visual Studio文档在:
The reason is that the program is running undefined behavior. The problem lies in the evaluation order, because there is no sequence points required according to C++98 standard ( no operations is sequenced before or after another according to C++11 terminology).
However if you stick to one compiler, you will find the behavior persistent, as long as you don't add function calls or pointers, which would make the behavior more messy.
Using Nuwen MinGW 15 GCC 7.1 you will get:
How does GCC work? it evaluates sub expressions at a left to right order for the right hand side (RHS) , then assigns the value to the left hand side (LHS) . This is exactly how Java and C# behave and define their standards. (Yes, the equivalent software in Java and C# has defined behaviors). It evaluate each sub expression one by one in the RHS Statement in a left to right order; for each sub expression: the ++c (pre-increment) is evaluated first then the value c is used for the operation, then the post increment c++).
according to GCC C++: Operators
the equivalent code in defined behavior C++ as GCC understands:
Then we go to Visual Studio. Visual Studio 2015, you get:
How does Visual Studio work, it takes another approach, it evaluates all pre-increments expressions in first pass, then uses variables values in the operations in second pass, assign from RHS to LHS in third pass, then at last pass it evaluates all the post-increment expressions in one pass.
So the equivalent in defined behavior C++ as Visual C++ understands:
as Visual Studio documentation states at Precedence and Order of Evaluation:
理解这一点的关键是
i ++
的 value 是i
,其效果是将1添加到i (即,在变量
i
中存储值i+1
),但这并不意味着在确定值时将进行存储。在
i +++++ I
之类的表达式中。但是,当两侧的效果发生时,这是不确定的,因此整个表达式的值不确定(
i +++++++ i
)。首先引用i
在当前语句之前使用i
的值,或右手效应后的i
(未确认执行顺序)或反之亦然第二个引用i
在第一个效果之后使用该值。 C标准专门说明未定义并定义其将优化者限制为无助的特定执行顺序。编译器注意到净效应是将
i
以2递增并进行评估是完全合理的(并且可能有效)是很合理的(i
商店i+2
在i
中,或者不应该做的是尝试弄清楚您的编译器并更改
编译器优化 行为
不相关变化或编译器的新版本都可以改变自己的
。显然(对您来说!)对周围代码的 您需要的代码(例如
2*i+1; i+= 2;
),并意识到所有现代商业编译器都将(当进行优化时)将其转化为您平台最有效的代码,并且 都有明显的保证意义。对于所有人类的读者来说, 不要想象它比
i = i+1
更有效,因为所有现代商业编译器都将为两者散发相同的代码。他们不是愚蠢的。The key to understanding this is that the value of the expression
i++
isi
and it's effect is to add 1 toi
(i.e. store the valuei+1
in the variablei
) but that does not mean that the store will take place when the value is determined.In an expression like
i++ + ++i
the value of the left-hand-side of the addition isi
and right-hand-side isi+1
.But it's undefined when the effect of either side takes place so undefined what the value of the whole expression (
i++ + ++i
). Will the first reference toi
use the value ofi
before the current statement or the one after the effect of right hand side (no order of execution is confirmed) or vice versa will the second reference toi
use the value after the effect of the first one. The C Standard specifically states that is undefined and defining it constrains the optimiser to a specific order of execution which is unhelpful.It's perfectly reasonable (and possibly efficient) for a compilter to notice that the net effect is to increment
i
by 2 and evaluate (what amounts toi+i+1
and later storei+2
ini
, or not do that.What you should not do is try and work out what your compiler does and play to it.
Changes to the compiler optimisation settings, apparently (to you!) unrelated changes to the surrounding code or new releases of the compiler could all change the behaviour.
You lay yourself open to one of the most time consuming kinds of bug that suddenly arise in apparently unchanged code.
Write the code you need (e.g.
2*i+1; i+=2;
) and realise that all modern commercial compilers will (when optimisation is on) translate that into the most efficient code for your platform and that it has an obvious and guaranteed meaning to all human readers.I even suggest never using
++
in any other expression than standalone and then only because it's easy to read. Don't imagine it's somehow more efficient thani=i+1
because all modern commercial compilers will emit the same code for both. They ain't daft.