术语“运算符优先级”和“求值顺序”是编程中非常常用的术语,对于程序员来说了解非常重要。而且,据我了解,这两个概念是紧密相连的;谈论表达,缺一不可。
让我们举一个简单的例子:
int a=1; // Line 1
a = a++ + ++a; // Line 2
printf("%d",a); // Line 3
现在,很明显,第 2 行会导致未定义的行为,因为 C 和 C++ 中的序列点 包括:
在 && 的左操作数和右操作数的计算之间(逻辑
和),|| (逻辑或)和逗号
运营商。例如,在
表达式 *p++ != 0 && *q++ != 0
,全部
子表达式的副作用
*p++ != 0
在尝试访问 q
之前完成。
在三元组的第一个操作数的求值之间
“问号”运算符和
第二个或第三个操作数。例如,
在表达式 a = (*p++) ? (*p++) : 0
之后有一个序列点
第一个 *p++
,意味着它已经
已增加的时间
第二个实例被执行。
位于完整表达式的末尾。该类别包括表达
语句(例如赋值语句
a=b;
),return 语句,
控制 if、switch、
while 或 do-while 语句以及所有
for 语句中的三个表达式。
在函数调用中输入函数之前。其中的顺序
参数的评估不是
指定,但是这个序列点
意味着它们所有的副作用
在函数执行之前就完成了
进入。在表达式 f(i++)+ g(j++) + h(k++)
中,
f
被调用
i
原始值的参数,
但 i
在输入之前会递增
f
的主体。类似地,j
和 k
是
在输入 g
和 h
之前更新
分别。然而,它并不是
指定顺序 f()
、g()
、h()
被执行,也不是按i
、j
、
k
递增。 j
的值和
因此,f
主体中的 k
为
未定义。3 请注意,函数
调用 f(a,b,c)
不是使用
逗号运算符和顺序
a
、b
和 c
的评估为
未指定。
在函数返回时,将返回值复制到
调用上下文。 (这个序列点
仅在C++标准中指定;
它仅隐含地存在于
C.)
在初始化程序的末尾;例如,评估5之后
在声明中int a = 5;
。
因此,按照第 3 点:
在完整表达式的末尾。该类别包括表达式语句(例如赋值 a=b;)、return 语句、if、switch、while 或 do-while 语句的控制表达式以及 for 语句中的所有三个表达式。
第 2 行
显然导致了未定义的行为。这显示了未定义行为如何与序列点紧密耦合。
现在让我们再举一个例子:
int x=10,y=1,z=2; // Line 4
int result = x<y<z; // Line 5
现在很明显,第 5 行
将使变量 result
存储 1
。
现在,第 5 行中的表达式 x 可以计算为:
x<(y 或 ( x。在第一种情况下,result
的值为 0
,在第二种情况下,result
将为 1
。但我们知道,当运算符优先级
为相等/相同
时,关联性
开始发挥作用,因此,计算结果为(x<; y)。
这是MSDN 文章中所说的内容:
C 运算符的优先级和结合性会影响表达式中操作数的分组和求值。仅当存在具有更高或更低优先级的其他运算符时,运算符的优先级才有意义。首先评估具有较高优先级运算符的表达式。优先级也可以用“绑定”一词来描述。具有较高优先级的运算符据说具有更紧密的绑定。
现在,关于上面的文章;它提到:
首先评估具有较高优先级运算符的表达式。
这听起来可能不正确。但是,如果我们认为 ()
也是一个运算符 x 与 (x。我的推理是,如果关联性没有发挥作用,那么完整的表达式评估将变得不明确,因为 <
不是 序列点。
另外,我发现的另一个链接在运算符优先级和关联性上说明了这一点:
此页面按优先级顺序(从高到低)列出 C 运算符。它们的结合性表明表达式中同等优先级的运算符的应用顺序。
因此,以 int result=x 的第二个例子为例,我们可以看到这里有 3 个表达式,x
, y
code> 和 z
,因为表达式的最简单形式由单个文字常量或对象组成。因此,表达式x
、y
、z
的结果将是rvalues,即10分别为
、1
和 2
。因此,现在我们可以将 x 解释为 10<1<2
。
现在,关联性不会发挥作用,因为现在我们有 2 个表达式要计算,10<1
或 1<2
并且由于运算符的优先级相同, 它们是从左到右评估的?
以最后一个示例作为我的论点:
int myval = ( printf("Operator\n"), printf("Precedence\n"), printf("vs\n"),
printf("Order of Evaluation\n") );
现在在上面的示例中,由于 comma
运算符具有相同的优先级,因此表达式将按 从左到右
的顺序求值,并且返回值为最后一个 printf()
存储在 myval
中。
在 SO/IEC 9899:201x 下J.1 未指定行为 它提到:
子表达式的求值顺序和副作用的顺序
发生,除了为函数调用指定的 ()、&&、||、?: 和逗号
运算符 (6.5)。
现在我想知道,以下说法是否错误?
评估顺序取决于运算符的优先级,留下未指定行为的情况。
如果我在问题中所说的内容有任何错误,希望得到纠正。
我之所以提出这个问题,是因为 MSDN 文章给我带来了困惑。是否处于错误状态?
The terms 'operator precedence' and 'order of evaluation' are very commonly used terms in programming and extremely important for a programmer to know. And, as far as I understand them, the two concepts are tightly bound; one cannot do without the other when talking about expressions.
Let us take a simple example:
int a=1; // Line 1
a = a++ + ++a; // Line 2
printf("%d",a); // Line 3
Now, it is evident that Line 2
leads to Undefined Behavior, since Sequence points in C and C++ include:
-
Between evaluation of the left and right operands of the && (logical
AND), || (logical OR), and comma
operators. For example, in the
expression *p++ != 0 && *q++ != 0
, all
side effects of the sub-expression
*p++ != 0
are completed before any attempt to access q
.
-
Between the evaluation of the first operand of the ternary
"question-mark" operator and the
second or third operand. For example,
in the expression a = (*p++) ? (*p++) : 0
there is a sequence point after
the first *p++
, meaning it has already
been incremented by the time the
second instance is executed.
-
At the end of a full expression. This category includes expression
statements (such as the assignment
a=b;
), return statements, the
controlling expressions of if, switch,
while, or do-while statements, and all
three expressions in a for statement.
-
Before a function is entered in a function call. The order in which
the arguments are evaluated is not
specified, but this sequence point
means that all of their side effects
are complete before the function is
entered. In the expression f(i++) + g(j++) + h(k++)
,
f
is called with a
parameter of the original value of i
,
but i
is incremented before entering
the body of f
. Similarly, j
and k
are
updated before entering g
and h
respectively. However, it is not
specified in which order f()
, g()
, h()
are executed, nor in which order i
, j
,
k
are incremented. The values of j
and
k
in the body of f
are therefore
undefined.3 Note that a function
call f(a,b,c)
is not a use of the
comma operator and the order of
evaluation for a
, b
, and c
is
unspecified.
-
At a function return, after the return value is copied into the
calling context. (This sequence point
is only specified in the C++ standard;
it is present only implicitly in
C.)
-
At the end of an initializer; for example, after the evaluation of 5
in the declaration int a = 5;
.
Thus, going by Point # 3:
At the end of a full expression. This category includes expression statements (such as the assignment a=b;), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.
Line 2
clearly leads to Undefined Behavior. This shows how Undefined Behaviour is tightly coupled with Sequence Points.
Now let us take another example:
int x=10,y=1,z=2; // Line 4
int result = x<y<z; // Line 5
Now its evident that Line 5
will make the variable result
store 1
.
Now the expression x<y<z
in Line 5
can be evaluated as either:
x<(y<z)
or (x<y)<z
. In the first case the value of result
will be 0
and in the second case result
will be 1
. But we know, when the Operator Precedence
is Equal/Same
- Associativity
comes into play, hence, is evaluated as (x<y)<z
.
This is what is said in this MSDN Article:
The precedence and associativity of C operators affect the grouping and evaluation of operands in expressions. An operator's precedence is meaningful only if other operators with higher or lower precedence are present. Expressions with higher-precedence operators are evaluated first. Precedence can also be described by the word "binding." Operators with a higher precedence are said to have tighter binding.
Now, about the above article; it mentions:
Expressions with higher-precedence operators are evaluated first.
It may sound incorrect. But, I think the article is not saying something wrong if we consider that ()
is also an operator x<y<z
is same as (x<y)<z
. My reasoning is if associativity does not come into play, then the complete expressions evaluation would become ambiguous since <
is not a Sequence Point.
Also, another link I found says this on Operator Precedence and Associativity:
This page lists C operators in order of precedence (highest to lowest). Their associativity indicates in what order operators of equal precedence in an expression are applied.
So taking, the second example of int result=x<y<z
, we can see here that there are in all 3 expressions, x
, y
and z
, since, the simplest form of an expression consists of a single literal constant or object. Hence the result of the expressions x
, y
, z
would be there rvalues, i.e., 10
, 1
and 2
respectively. Hence, now we may interpret x<y<z
as 10<1<2
.
Now, doesn't Associativity come into play since now we have 2 expressions to be evaluated, either 10<1
or 1<2
and since the precedence of operator is same, they are evaluated from left to right?
Taking this last example as my argument:
int myval = ( printf("Operator\n"), printf("Precedence\n"), printf("vs\n"),
printf("Order of Evaluation\n") );
Now in the above example, since the comma
operator has same precedence, the expressions are evaluated left-to-right
and the return value of the last printf()
is stored in myval
.
In SO/IEC 9899:201x under J.1 Unspecified behavior it mentions:
The order in which subexpressions are evaluated and the order in which side effects
take place, except as specified for the function-call (), &&, ||, ?:, and comma
operators (6.5).
Now I would like to know, would it be wrong to say:
Order of Evaluation depends on the precedence of operators, leaving cases of Unspecified Behavior.
I would like to be corrected if any mistakes were made in something I said in my question.
The reason I posted this question is because of the confusion created in my mind by the MSDN Article. Is it in Error or not?
发布评论
评论(6)
是的,MSDN 文章有错误,至少在标准 C 和 C++1 方面是这样。
话虽如此,让我首先注意一下术语:在 C++ 标准中,他们(大多数——有一些失误)使用“评估”来指代操作数的评估,使用“值计算”来指代进行操作。因此,当(例如)执行
a + b
时,会对a
和b
中的每一个进行求值,然后执行值计算确定结果。很明显,值计算的顺序(主要)由优先级和关联性控制——控制值计算基本上是优先级和关联性的定义。该答案的其余部分使用“评估”来指操作数的评估,而不是值计算。
现在,至于评估顺序是由优先级决定的,不,不是!就这么简单。举例来说,让我们考虑一下您的
x 示例。根据结合性规则,这会解析为
(x。现在,考虑在堆栈计算机上计算此表达式。完全允许它做这样的事情:
这会在
x
或y
之前评估z
,但仍然评估(x,然后将比较结果与
z
进行比较,正如预期的那样。摘要: 求值顺序与关联性无关。
优先级也是同样的道理。我们可以将表达式更改为
x*y+z
,并且仍然在x
或y
之前计算z
: :评估顺序与优先级无关。
当/如果我们添加副作用时,这保持不变。我认为将副作用视为由单独的执行线程执行,并在下一个序列点(例如表达式的末尾)加入
join
是有教育意义的。因此,像a=b++ + ++c;
这样的东西可以这样执行:这也说明了为什么明显的依赖关系也不一定会影响评估顺序。即使
a
是赋值的目标,它仍然会在评估b
或之前先评估
.另请注意,虽然我在上面将其写为“线程”,但这也可能是一个池线程,所有线程都是并行执行的,因此您无法获得有关顺序的任何保证一个增量与另一个增量的比较。a
。 c除非硬件对线程安全队列有直接(并且廉价)支持,否则这可能不会在实际实现中使用(即使这样也不太可能)。将某些内容放入线程安全队列通常会比执行单个增量产生更多开销,因此很难想象现实中有人会这样做。然而,从概念上讲,这个想法符合标准的要求:当您使用前/后递增/递减操作时,您指定了一个操作,该操作将在表达式的该部分被求值之后的某个时间发生,并且将在下一个序列点。
编辑:虽然它不完全是线程,但某些架构确实允许这种并行执行。举几个例子,Intel Itanium 和 VLIW 处理器(例如某些 DSP)允许编译器指定多个要并行执行的指令。大多数 VLIW 机器都有特定的指令“包”大小,限制并行执行的指令数量。安腾也使用指令包,但在指令包中指定一个位来表示当前包中的指令可以与下一个包中的指令并行执行。使用这样的机制,您可以并行执行指令,就像在我们大多数人更熟悉的体系结构上使用多个线程一样。
摘要:评估顺序独立于明显的依赖性
任何在下一个序列点之前使用该值的尝试都会产生未定义的行为 - 特别是,“其他线程”正在(可能)在那段时间修改该数据,并且您有无法与其他线程同步访问。任何使用它的尝试都会导致未定义的行为。
仅举一个(诚然,现在相当牵强)示例,想象一下您的代码在 64 位虚拟机上运行,但真正的硬件是 8 位处理器。当您递增 64 位变量时,它会执行一个类似于以下的序列:
如果您在该序列中间的某个位置读取该值,您可能会得到仅修改了某些字节的内容,因此您得到的既不是旧值也是新的。
这个确切的例子可能相当牵强,但不太极端的版本(例如,32 位机器上的 64 位变量)实际上相当常见。
结论
求值顺序不依赖于优先级、关联性或(必然)依赖于明显的依赖关系。尝试在表达式的任何其他部分中使用已应用前/后增量/减量的变量确实会产生完全未定义的行为。虽然实际崩溃的可能性不大,但您绝对不能保证获得旧值或新值 - 您可能会获得完全不同的东西。
1 我还没有检查过这篇特定的文章,但相当多的 MSDN 文章谈论了 Microsoft 的托管 C++ 和/或 C++/CLI(或特定于他们的 C++ 实现),但几乎没有或没有做任何事情指出它们不适用于标准 C 或 C++。这可能会造成一种错误的印象,即他们声称他们决定应用于自己语言的规则实际上适用于标准语言。在这些情况下,这些文章在技术上并不是错误的——它们只是与标准 C 或 C++ 没有任何关系。如果您尝试将这些语句应用于标准 C 或 C++,结果将为 false。
Yes, the MSDN article is in error, at least with respect to standard C and C++1.
Having said that, let me start with a note about terminology: in the C++ standard, they (mostly--there are a few slip-ups) use "evaluation" to refer to evaluating an operand, and "value computation" to refer to carrying out an operation. So, when (for example) you do
a + b
, each ofa
andb
is evaluated, then the value computation is carried out to determine the result.It's clear that the order of value computations is (mostly) controlled by precedence and associativity--controlling value computations is basically the definition of what precedence and associativity are. The remainder of this answer uses "evaluation" to refer to evaluation of operands, not to value computations.
Now, as to evaluation order being determined by precedence, no it's not! It's as simple as that. Just for example, let's consider your example of
x<y<z
. According to the associativity rules, this parses as(x<y)<z
. Now, consider evaluating this expression on a stack machine. It's perfectly allowable for it to do something like this:This evaluates
z
beforex
ory
, but still evaluates(x<y)
, then compares the result of that comparison toz
, just as it's supposed to.Summary: Order of evaluation is independent of associativity.
Precedence is the same way. We can change the expression to
x*y+z
, and still evaluatez
beforex
ory
:Summary: Order of evaluation is independent of precedence.
When/if we add in side effects, this remains the same. I think it's educational to think of side effects as being carried out by a separate thread of execution, with a
join
at the next sequence point (e.g., the end of the expression). So something likea=b++ + ++c;
could be executed something like this:This also shows why an apparent dependency doesn't necessarily affect order of evaluation either. Even though
a
is the target of the assignment, this still evaluatesa
before evaluating eitherb
orc
. Also note that although I've written it as "thread" above, this could also just as well be a pool of threads, all executing in parallel, so you don't get any guarantee about the order of one increment versus another either.Unless the hardware had direct (and cheap) support for thread-safe queuing, this probably wouldn't be used in in a real implementation (and even then it's not very likely). Putting something into a thread-safe queue will normally have quite a bit more overhead than doing a single increment, so it's hard to imagine anybody ever doing this in reality. Conceptually, however, the idea is fits the requirements of the standard: when you use a pre/post increment/decrement operation, you're specifying an operation that will happen sometime after that part of the expression is evaluated, and will be complete at the next sequence point.
Edit: though it's not exactly threading, some architectures do allow such parallel execution. For a couple of examples, the Intel Itanium and VLIW processors such as some DSPs, allow a compiler to designate a number of instructions to be executed in parallel. Most VLIW machines have a specific instruction "packet" size that limits the number of instructions executed in parallel. The Itanium also uses packets of instructions, but designates a bit in an instruction packet to say that the instructions in the current packet can be executed in parallel with those in the next packet. Using mechanisms like this, you get instructions executing in parallel, just like if you used multiple threads on architectures with which most of us are more familiar.
Summary: Order of evaluation is independent of apparent dependencies
Any attempt at using the value before the next sequence point gives undefined behavior -- in particular, the "other thread" is (potentially) modifying that data during that time, and you have no way of synchronizing access with the other thread. Any attempt at using it leads to undefined behavior.
Just for a (admittedly, now rather far-fetched) example, think of your code running on a 64-bit virtual machine, but the real hardware is an 8-bit processor. When you increment a 64-bit variable, it executes a sequence something like:
If you read the value somewhere in the middle of that sequence, you could get something with only some of the bytes modified, so what you get is neither the old value nor the new one.
This exact example may be pretty far-fetched, but a less extreme version (e.g., a 64-bit variable on a 32-bit machine) is actually fairly common.
Conclusion
Order of evaluation does not depend on precedence, associativity, or (necessarily) on apparent dependencies. Attempting to use a variable to which a pre/post increment/decrement has been applied in any other part of an expression really does give completely undefined behavior. While an actual crash is unlikely, you're definitely not guaranteed to get either the old value or the new one -- you could get something else entirely.
1 I haven't checked this particular article, but quite a few MSDN articles talk about Microsoft's Managed C++ and/or C++/CLI (or are specific to their implementation of C++) but do little or nothing to point out that they don't apply to standard C or C++. This can give the false appearance that they're claiming the rules they have decided to apply to their own languages actually apply to the standard languages. In these cases, the articles aren't technically false -- they just don't have anything to do with standard C or C++. If you attempt to apply those statements to standard C or C++, the result is false.
优先级影响求值顺序的唯一方式是
创建依赖关系;否则两者是正交的。你已经
精心挑选的简单示例,其中创建的依赖项
优先级确实最终完全定义了求值顺序,但这不是
一般来说是正确的。也不要忘记,许多表达式都有
两种效果:它们产生一个值,并且它们有副作用。这些
两者不需要同时发生,因此即使存在依赖关系
强制执行特定的评估顺序,这只是顺序
价值观的评估;它对副作用没有影响。
The only way precedence influences order of evaluation is that it
creates dependencies; otherwise the two are orthogonal. You've
carefully chosen trivial examples where the dependencies created by
precedence do end up fully defining order of evaluation, but this isn't
generally true. And don't forget, either, that many expressions have
two effects: they result in a value, and they have side effects. These
two are no required to occur together, so even when dependencies
force a specific order of evaluation, this is only the order of
evaluation of the values; it has no effect on side effects.
查看此问题的一个好方法是采用表达式树。
如果您有一个表达式,例如
x+y*z
,您可以将其重写为表达式树:应用优先级和关联性规则:
应用优先级和关联性规则后,您可以放心地忘记它们。
以树的形式:
现在该表达式的叶子是
x
、y
和z
。这意味着您可以按您想要的任何顺序计算x
、y
和z
,并且还意味着您可以计算结果任意顺序的*
和x
。现在,由于这些表达式没有副作用,您并不真正关心。但如果它们这样做,顺序可能会改变结果,并且由于顺序可以是编译器决定的任何顺序,所以您就会遇到问题。
现在,序列点为这种混乱带来了一些秩序。他们有效地将树切成几段。
x + y * z, z = 10, x + y * z
优先级和关联性之后
x + ( y * z ) , z = 10, x + ( y * z)
code>the tree:
树的顶部部分将在中间部分之前评估,中间部分在底部之前评估。
A good way to look at this is to take the expression tree.
If you have an expression, lets say
x+y*z
you can rewrite that into an expression tree:Applying the priority and associativity rules:
After applying the priority and associativity rules, you can safely forget about them.
In tree form:
Now the leaves of this expression are
x
,y
andz
. What this means is that you can evaluatex
,y
andz
in any order you want, and also it means that you can evaluate the result of*
andx
in any order.Now since these expressions don't have side effects you don't really care. But if they do, the ordering can change the result, and since the ordering can be anything the compiler decides, you have a problem.
Now, sequence points bring a bit of order into this chaos. They effectively cut the tree into sections.
x + y * z, z = 10, x + y * z
after priority and associativity
x + ( y * z ) , z = 10, x + ( y * z)
the tree:
The top part of the tree will be evaluated before the middle, and middle before bottom.
我只是要重复我在此处所说的话。就标准 C 和 C++ 而言,这篇文章是有缺陷的。优先级仅影响哪些标记被视为每个运算符的操作数,但它不会以任何方式影响计算顺序。
因此,该链接仅解释了 Microsoft 如何实现事物,而不解释该语言本身如何工作。
I am just going to repeat what I said here. As far as standard C and C++ are concerned that article is flawed. Precedence only affects which tokens are considered to be the operands of each operator, but it does not affect in any way the order of evaluation.
So, the link only explains how Microsoft implemented things, not how the language itself works.
优先级与求值顺序无关,反之亦然。
< strong>优先级规则描述了当表达式混合不同类型的运算符时,下括号表达式应如何加上括号。例如,乘法的优先级高于加法,因此
2 + 3 x 4
相当于2 + (3 x 4)
,而不是(2 + 3 ) x 4
。求值顺序规则描述了表达式中每个操作数的求值顺序。
举个例子
根据运算符优先级规则,它将被括号为 (
++/--
的优先级高于||
,而||
的优先级高于=):
逻辑 OR
||
的求值顺序指出 (C11 6.5.14)这意味着将首先计算左操作数,即子表达式
(x++)
。由于短路行为; 如果第一个操作数与0
比较不等于,则不计算第二个操作数,右操作数--y
虽然是括号,但不会计算早于(++x) || (--y)
。Precedence has nothing to do with order of evaluation and vice-versa.
Precedence rules describe how an underparenthesized expression should be parenthesized when the expression mixes different kinds of operators. For example, multiplication is of higher precedence than addition, so
2 + 3 x 4
is equivalent to2 + (3 x 4)
, not(2 + 3) x 4
.Order of evaluation rules describe the order in which each operand in an expression is evaluated.
Take an example
By operator precedence rule, it will be parenthesize as (
++/--
has higher precedence than||
which has higher precedence than=
):The order of evaluation of logical OR
||
states that (C11 6.5.14)This means that the left operand, i.e the sub-expression
(x++)
will be evaluated first. Due to short circuiting behavior; If the first operand compares unequal to0
, the second operand is not evaluated, right operand--y
will not be evaluated although it is parenthesize prior than(++x) || (--y)
.我认为这只是
表达问题,因为
首先适合 3. 规则,然后适合 6. 规则:在分配之前完成评估。
因此,
对于 a=1 的 gets 完全评估为:
结果是相同的 = 4。
An
将会有未定义的结果。不是吗?
I think it's only the
epxression problematic, because
fits first in 3. but then in the 6. rule: complete evaluation before assignment.
So,
gets for a=1 fully evaluated to:
The result is the same = 4.
An
would have undefined results. Isn't it?