当前位置：文江博客话题详情

C# compiler-construction compiler-optimization inference

常量和编译时评估 - 为什么要改变这种行为

发布于 2024-12-28 20:03:09 字数 522 浏览 6 评论 0 原文

如果您转发到此内容大约 13 分钟视频，作者：Eric Lippert，他描述了对 C# 编译器所做的更改，该更改使以下代码无效（显然是在并包括 .NET 2 该代码已编译）。

int y;
int x = 10;
if (x * 0 == 0)
    y = 123;

Console.Write(y);

现在我明白，显然上述代码的任何执行实际上都会计算为

int y;
int x = 10;
y = 123;
Console.Write(y);

但我不明白的是为什么它被认为是“理想的”使以下代码不可编译？ IE：任由这样的推论继续下去有什么风险？

原文

If you forward to approximately 13 minutes into this video by Eric Lippert he describes a change that was made to the C# compiler that renders the following code invalid (Apparently prior to and including .NET 2 this code would have compiled).

int y;
int x = 10;
if (x * 0 == 0)
    y = 123;

Console.Write(y);

Now I understand that clearly any execution of the above code actually evaluates to

int y;
int x = 10;
y = 123;
Console.Write(y);

But what I dont understand is why it is considered "desirable" to make the following code in-compilable? IE: What are the risks with allowing such inferences to run their course?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

回忆凄美了谁 2025-01-04 20:03:09

我仍然觉得这个问题有点令人困惑，但让我看看是否可以将问题改写成我可以回答的形式。首先，让我重申一下问题的背景：

在 C# 2.0 中，这段代码：

int x = 123;
int y;
if (x * 0 == 0) 
    y = 345;
Console.WriteLine(y);

被视为您编写的代码

int x = 123;
int y;
if (true) 
    y = 345;
Console.WriteLine(y);

，而这又被视为：

int x = 123;
int y;
y = 345;
Console.WriteLine(y);

这是一个合法程序。

但在 C# 3.0 中，我们进行了重大更改来防止这种情况发生。编译器不再将条件视为“始终为真”，尽管您和我都知道它始终为真。我们现在将其定为非法程序，因为编译器认为它不知道“if”的主体始终被执行，因此不知道局部变量 y 始终在之前被赋值它被使用了。

为什么 C# 3.0 的行为是正确的？

这是正确的，因为规范指出：

常量表达式必须仅包含常量。 x * 0 == 0 不是常量表达式，因为它包含非常量项 x。
仅当条件是等于 true 的常量表达式时，if 的结果才知道始终可达。

因此，给出的代码不应将条件语句的结果分类为始终可达，因此不应将局部 y 分类为明确赋值。

为什么常量表达式最好只包含常量？

我们希望 C# 语言能够被用户清楚地理解，并且能够被编译器编写者正确地实现。要求编译器对表达式的值进行所有可能的逻辑推导不符合这些目标。确定给定表达式是否为常量以及如果是常量，其值是什么应该简单。简而言之，常量求值代码应该知道如何执行算术，但不需要知道有关算术操作的事实。常量求值器知道如何乘以 2 * 1，但它不需要知道“1 是整数的乘法恒等式”这一事实。

现在，编译器编写者可能认为在某些领域他们可以发挥聪明才智，从而生成更优化的代码。编译器编写者可以这样做，但不能以改变代码合法或非法的方式。他们只被允许进行优化，在给定合法代码时使编译器的输出更好。

C# 2.0 中的错误是如何发生的？

发生的情况是编译器被编写为过早运行算术优化器。优化器应该是聪明的部分，它应该在程序被确定为合法之后运行。它在程序被确定为合法之前运行，因此影响结果。

这是一个潜在的重大变化：虽然它使编译器符合规范，但它也可能将工作代码变成错误代码。是什么促使了这一改变？

LINQ 功能，特别是表达式树。如果您说了类似的话：

(int x)=>x * 0 == 0

并将其转换为表达式树，您是否期望生成的表达式树

(int x)=>true

？可能不是！您可能期望它生成“将 x 乘以零并将结果与零进行比较”的表达式树。 表达式树应该保留正文中表达式的逻辑结构。

当我编写表达式树代码时，尚不清楚设计委员会是否会决定是否

()=>2 + 3

为以下内容生成表达式树： “将二加三”或“五”的表达式树。我们决定采用后者——在生成表达式树之前折叠常量，但是在生成表达式树之前不应通过优化器运行算术。

所以，现在让我们考虑一下我们刚才所说的依赖关系：

算术优化必须在代码生成之前发生。
表达式树重写必须在算术优化之前进行
常量折叠必须在表达式树重写之前进行
常量折叠必须在流分析之前进行
流分析必须在表达式树重写之前进行（因为我们需要知道表达式树是否使用未初始化的局部变量）

我们必须找到一个顺序来完成所有这些工作，以尊重所有这些依赖关系。 C# 2.0 中的编译器是按照这样的顺序做的：

常量折叠和算术优化同时进行
流程分析
codegen

表达式树重写可以去哪里？无处！显然这是有问题的，因为流分析现在考虑了算术优化器推导出的事实。我们决定重新设计编译器，使其按以下顺序执行操作：

常量折叠
流分析
表达式树重写
算术优化
代码生成

这显然需要进行重大更改。

现在，我确实考虑通过这样做来保留现有的损坏行为：

常量折叠
算术优化
流分析
算术去优化
表达式树
再次重写算术优化
codegen

其中优化的算术表达式将包含返回其未优化形式的指针。我们认为为了保留错误，这太复杂了。我们认为最好是修复错误，进行重大更改，并使编译器架构更容易理解。

I'm still finding this question a bit confusing but let me see if I can rephrase the question into a form that I can answer. First, let me re-state the background of the question:

In C# 2.0, this code:

int x = 123;
int y;
if (x * 0 == 0) 
    y = 345;
Console.WriteLine(y);

was treated as though you'd written

int x = 123;
int y;
if (true) 
    y = 345;
Console.WriteLine(y);

which in turn is treated as:

int x = 123;
int y;
y = 345;
Console.WriteLine(y);

Which is a legal program.

But in C# 3.0 we took the breaking change to prevent this. The compiler no longer treats the condition as being "always true" despite the fact that you and I both know that it is always true. We now make this an illegal program, because the compiler reasons that it does not know that the body of the "if" is always executed, and therefore does not know that the local variable y is always assigned before it is used.

Why is the C# 3.0 behaviour correct?

It is correct because the specification states that:

a constant expression must contain only constants. x * 0 == 0 is not a constant expression because it contains a non-constant term, x.
the consequence of an if is only known to be always reachable if the condition is a constant expression equal to true.

Therefore, the code given should not classify the consequence of the conditional statement to be always reachable, and therefore should not classify the local y as being definitely assigned.

Why is it desirable that a constant expression contain only constants?

We want the C# language to be clearly understandable by its users, and correctly implementable by compiler writers. Requiring that the compiler make all possible logical deductions about the values of expressions works against those goals. It should be simple to determine whether a given expression is a constant, and if so, what its value is. Put simply, the constant evaluation code should have to know how to perform arithmetic, but should not need to know facts about arithmetical manipulations. The constant evaluator knows how to multiply 2 * 1, but it does not need to know the fact that "1 is the multiplicative identity on integers".

Now, it is possible that a compiler writer might decide that there are areas in which they can be clever, and thereby generate more optimal code. Compiler writers are permitted to do so, but not in a way that changes whether code is legal or illegal. They are only allowed to make optimizations that make the output of the compiler better when given legal code.

How did the bug happen in C# 2.0?

What happened was the compiler was written to run the arithmetic optimizer too early. The optimizer is the bit that is supposed to be clever, and it should have run after the program was determined to be legal. It was running before the program was determined to be legal, and was therefore influencing the result.

This was a potential breaking change: though it brought the compiler into line with the specification, it also potentially turned working code into error code. What motivated the change?

LINQ features, and specifically expression trees. If you said something like:

(int x)=>x * 0 == 0

and converted that to an expression tree, do you expect that to generate the expression tree for

(int x)=>true

? Probably not! You probably expected it to produce the expression tree for "multiply x by zero and compare the result to zero". Expression trees should preserve the logical structure of the expression in the body.

When I wrote the expression tree code it was not clear yet whether the design committee was going to decide whether

()=>2 + 3

was going to generate the expression tree for "add two to three" or the expression tree for "five". We decided on the latter -- constants are folded before expression trees are generated, but arithmetic should not be run through the optimizer before expression trees are generated.

So, let's consider now the dependencies that we've just stated:

Arithmetic optimization has to happen before codegen.
Expression tree rewriting has to happen before arithmetic optimizations
Constant folding has to happen before expression tree rewriting
Constant folding has to happen before flow analysis
Flow analysis has to happen before expression tree rewriting (because we need to know if an expression tree uses an uninitialized local)

We've got to find an order to do all this work in that honours all those dependencies. The compiler in C# 2.0 did them in this order:

constant folding and arithmetic optimization at the same time
flow analysis
codegen

Where can expression tree rewriting go in there? Nowhere! And clearly this is buggy, because flow analysis is now taking into account facts deduced by the arithmetic optimizer. We decided to rework the compiler so that it did things in the order:

constant folding
flow analysis
expression tree rewriting
arithmetic optimization
codegen

Which obviously necessitates the breaking change.

Now, I did consider preserving the existing broken behaviour, by doing this:

constant folding
arithmetic optimization
flow analysis
arithmetic de-optimization
expression tree rewriting
arithmetic optimization again
codegen

Where the optimized arithmetic expression would contain a pointer back to its unoptimized form. We decided that this was too much complexity in order to preserve a bug. We decided that it would be better to instead fix the bug, take the breaking change, and make the compiler architecture more easily understood.

回复收藏 0 原文

娇纵 2025-01-04 20:03:09

该规范规定，仅在 if 块内分配的内容的明确分配是不确定的。该规范没有提及删除不必要的 if 块的编译器魔法。特别是，当您更改 if 条件时，它会产生非常令人困惑的错误消息，并突然收到有关 y 未分配的错误“嗯？我没有更改当 y 被分配时！”。

编译器可以自由地执行任何它想要的明显代码删除，但首先它需要遵循规则规范。

具体来说，第 5.3.3.5 节（MS 4.0 规范）：

5.3.3.5 If 语句
对于以下形式的 if 语句 stmt：

if ( expr ) then-stmt else else -stmt

v 在 expr 开头与 stmt 开头具有相同的明确赋值状态。

如果 v 在 expr 末尾明确赋值，则在控制流传输到 then-stmt 和 else-stmt 或 stmt 的终点（如果没有 else 子句）时，它肯定会被赋值。< /里>

如果 v 在 expr 末尾具有“在 true 表达式之后明确分配”的状态，则它在控制流传输到 then-stmt 时明确分配，而在控制流传输到 else-stmt 时未明确分配如果没有 else 子句，则到 stmt 的终点。

如果 v 在 expr 末尾具有“在 false 表达式后明确赋值”的状态，则它在控制流传输到 else-stmt 时被明确赋值，而在控制流传输到 then-stmt 时未明确赋值。当且仅当它在 then-stmt 的终点确定赋值时，它在 stmt 的终点确定赋值。

否则，v 被认为在控制流传输上未明确分配给 then-stmt 或 else-stmt，或者如果没有 else，则被分配给 stmt 的终点

为了使最初未分配的变量被视为在某个位置明确分配，对该变量的分配必须发生在通向该位置的每个可能的执行路径中。

技术上，执行路径存在于if条件为假的地方；如果 y 也被分配在 else 中，那么很好，但是...规范明确要求不要求发现 if 条件始终为真的。

The specification states that the definite assignment of something that is only assigned inside an if block is undetermined. The spec says nothing about compiler magic that removes the unnecessary if block. In particular, it makes for a very confusing error message as you change the if condition, and suddenly get an error about y not being assigned "huh? I haven't changed when y is assigned!".

The compiler is free to perform any obvious code removal it wants to, but first it needs to follow the specification for the rules.

Specifically, section 5.3.3.5 (MS 4.0 spec):

5.3.3.5 If statements
For an if statement stmt of the form:

if ( expr ) then-stmt else else-stmt

v has the same definite assignment state at the beginning of expr as at the beginning of stmt.

If v is definitely assigned at the end of expr, then it is definitely assigned on the control flow transfer to then-stmt and to either else-stmt or to the end-point of stmt if there is no else clause.

If v has the state “definitely assigned after true expression” at the end of expr, then it is definitely assigned on the control flow transfer to then-stmt, and not definitely assigned on the control flow transfer to either else-stmt or to the end-point of stmt if there is no else clause.

If v has the state “definitely assigned after false expression” at the end of expr, then it is definitely assigned on the control flow transfer to else-stmt, and not definitely assigned on the control flow transfer to then-stmt. It is definitely assigned at the end-point of stmt if and only if it is definitely assigned at the end-point of then-stmt.

Otherwise, v is considered not definitely assigned on the control flow transfer to either the then-stmt or else-stmt, or to the end-point of stmt if there is no else

For an initially unassigned variable to be considered definitely assigned at a certain location, an assignment to the variable must occur in every possible execution path leading to that location.

technically, the execution path exists where the if condition is false; if y was also assigned in the else, then fine, but... the specification explicitly makes no demand of spotting the if condition is always true.

回复收藏 0 原文

~没有更多了~

关于作者

梦途

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

常量和编译时评估 - 为什么要改变这种行为

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

常量和编译时评估 - 为什么要改变这种行为

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

眼泪淡了忧伤

corot39

守护在此方

github_3h15MP3i7

相思故

滥情空心

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。