如果您转发到此内容大约 13 分钟视频,作者:Eric Lippert,他描述了对 C# 编译器所做的更改,该更改使以下代码无效(显然是在并包括 .NET 2 该代码已编译)。
int y;
int x = 10;
if (x * 0 == 0)
y = 123;
Console.Write(y);
现在我明白,显然上述代码的任何执行实际上都会计算为
int y;
int x = 10;
y = 123;
Console.Write(y);
但我不明白的是为什么它被认为是“理想的”使以下代码不可编译? IE:任由这样的推论继续下去有什么风险?
If you forward to approximately 13 minutes into this video by Eric Lippert he describes a change that was made to the C# compiler that renders the following code invalid (Apparently prior to and including .NET 2 this code would have compiled).
int y;
int x = 10;
if (x * 0 == 0)
y = 123;
Console.Write(y);
Now I understand that clearly any execution of the above code actually evaluates to
int y;
int x = 10;
y = 123;
Console.Write(y);
But what I dont understand is why it is considered "desirable" to make the following code in-compilable? IE: What are the risks with allowing such inferences to run their course?
发布评论
评论(2)
我仍然觉得这个问题有点令人困惑,但让我看看是否可以将问题改写成我可以回答的形式。首先,让我重申一下问题的背景:
在 C# 2.0 中,这段代码:
被视为您编写的代码
,而这又被视为:
这是一个合法程序。
但在 C# 3.0 中,我们进行了重大更改来防止这种情况发生。编译器不再将条件视为“始终为真”,尽管您和我都知道它始终为真。我们现在将其定为非法程序,因为编译器认为它不知道“if”的主体始终被执行,因此不知道局部变量 y 始终在之前被赋值它被使用了。
这是正确的,因为规范指出:
常量表达式必须仅包含常量。
x * 0 == 0
不是常量表达式,因为它包含非常量项x
。仅当条件是等于
true
的常量表达式时,if
的结果才知道始终可达。因此,给出的代码不应将条件语句的结果分类为始终可达,因此不应将局部
y
分类为明确赋值。我们希望 C# 语言能够被用户清楚地理解,并且能够被编译器编写者正确地实现。要求编译器对表达式的值进行所有可能的逻辑推导不符合这些目标。确定给定表达式是否为常量以及如果是常量,其值是什么应该简单。简而言之,常量求值代码应该知道如何执行算术,但不需要知道有关算术操作的事实。常量求值器知道如何乘以 2 * 1,但它不需要知道“1 是整数的乘法恒等式”这一事实。
现在,编译器编写者可能认为在某些领域他们可以发挥聪明才智,从而生成更优化的代码。编译器编写者可以这样做,但不能以改变代码合法或非法的方式。他们只被允许进行优化,在给定合法代码时使编译器的输出更好。
发生的情况是编译器被编写为过早运行算术优化器。优化器应该是聪明的部分,它应该在程序被确定为合法之后运行。它在程序被确定为合法之前运行,因此影响结果。
LINQ 功能,特别是表达式树。如果您说了类似的话:
并将其转换为表达式树,您是否期望生成 的表达式树
?可能不是!您可能期望它生成“将 x 乘以零并将结果与零进行比较”的表达式树。 表达式树应该保留正文中表达式的逻辑结构。
当我编写表达式树代码时,尚不清楚设计委员会是否会决定是否
为以下内容生成表达式树: “将二加三”或“五”的表达式树。我们决定采用后者——在生成表达式树之前折叠常量,但是在生成表达式树之前不应通过优化器运行算术。
所以,现在让我们考虑一下我们刚才所说的依赖关系:
我们必须找到一个顺序来完成所有这些工作,以尊重所有这些依赖关系。 C# 2.0 中的编译器是按照这样的顺序做的:
表达式树重写可以去哪里?无处!显然这是有问题的,因为流分析现在考虑了算术优化器推导出的事实。我们决定重新设计编译器,使其按以下顺序执行操作:
这显然需要进行重大更改。
现在,我确实考虑通过这样做来保留现有的损坏行为:
其中优化的算术表达式将包含返回其未优化形式的指针。我们认为为了保留错误,这太复杂了。我们认为最好是修复错误,进行重大更改,并使编译器架构更容易理解。
I'm still finding this question a bit confusing but let me see if I can rephrase the question into a form that I can answer. First, let me re-state the background of the question:
In C# 2.0, this code:
was treated as though you'd written
which in turn is treated as:
Which is a legal program.
But in C# 3.0 we took the breaking change to prevent this. The compiler no longer treats the condition as being "always true" despite the fact that you and I both know that it is always true. We now make this an illegal program, because the compiler reasons that it does not know that the body of the "if" is always executed, and therefore does not know that the local variable y is always assigned before it is used.
It is correct because the specification states that:
a constant expression must contain only constants.
x * 0 == 0
is not a constant expression because it contains a non-constant term,x
.the consequence of an
if
is only known to be always reachable if the condition is a constant expression equal totrue
.Therefore, the code given should not classify the consequence of the conditional statement to be always reachable, and therefore should not classify the local
y
as being definitely assigned.We want the C# language to be clearly understandable by its users, and correctly implementable by compiler writers. Requiring that the compiler make all possible logical deductions about the values of expressions works against those goals. It should be simple to determine whether a given expression is a constant, and if so, what its value is. Put simply, the constant evaluation code should have to know how to perform arithmetic, but should not need to know facts about arithmetical manipulations. The constant evaluator knows how to multiply 2 * 1, but it does not need to know the fact that "1 is the multiplicative identity on integers".
Now, it is possible that a compiler writer might decide that there are areas in which they can be clever, and thereby generate more optimal code. Compiler writers are permitted to do so, but not in a way that changes whether code is legal or illegal. They are only allowed to make optimizations that make the output of the compiler better when given legal code.
What happened was the compiler was written to run the arithmetic optimizer too early. The optimizer is the bit that is supposed to be clever, and it should have run after the program was determined to be legal. It was running before the program was determined to be legal, and was therefore influencing the result.
LINQ features, and specifically expression trees. If you said something like:
and converted that to an expression tree, do you expect that to generate the expression tree for
? Probably not! You probably expected it to produce the expression tree for "multiply x by zero and compare the result to zero". Expression trees should preserve the logical structure of the expression in the body.
When I wrote the expression tree code it was not clear yet whether the design committee was going to decide whether
was going to generate the expression tree for "add two to three" or the expression tree for "five". We decided on the latter -- constants are folded before expression trees are generated, but arithmetic should not be run through the optimizer before expression trees are generated.
So, let's consider now the dependencies that we've just stated:
We've got to find an order to do all this work in that honours all those dependencies. The compiler in C# 2.0 did them in this order:
Where can expression tree rewriting go in there? Nowhere! And clearly this is buggy, because flow analysis is now taking into account facts deduced by the arithmetic optimizer. We decided to rework the compiler so that it did things in the order:
Which obviously necessitates the breaking change.
Now, I did consider preserving the existing broken behaviour, by doing this:
Where the optimized arithmetic expression would contain a pointer back to its unoptimized form. We decided that this was too much complexity in order to preserve a bug. We decided that it would be better to instead fix the bug, take the breaking change, and make the compiler architecture more easily understood.
该规范规定,仅在
if
块内分配的内容的明确分配是不确定的。该规范没有提及删除不必要的if
块的编译器魔法。特别是,当您更改if
条件时,它会产生非常令人困惑的错误消息,并突然收到有关y
未分配的错误“嗯?我没有更改当 y 被分配时!”。编译器可以自由地执行任何它想要的明显代码删除,但首先它需要遵循规则规范。
具体来说,第 5.3.3.5 节(MS 4.0 规范):
技术上,执行路径存在于
if
条件为假的地方;如果y
也被分配在else
中,那么很好,但是...规范明确要求不要求发现if
条件始终为真的。The specification states that the definite assignment of something that is only assigned inside an
if
block is undetermined. The spec says nothing about compiler magic that removes the unnecessaryif
block. In particular, it makes for a very confusing error message as you change theif
condition, and suddenly get an error abouty
not being assigned "huh? I haven't changed when y is assigned!".The compiler is free to perform any obvious code removal it wants to, but first it needs to follow the specification for the rules.
Specifically, section 5.3.3.5 (MS 4.0 spec):
technically, the execution path exists where the
if
condition is false; ify
was also assigned in theelse
, then fine, but... the specification explicitly makes no demand of spotting theif
condition is always true.