决定错误的程序是否可以有正确的延续
(以下问题涉及 OCaml 语言,并在 OCaml 中有示例,但问题非常笼统,可能任何其他计算机语言的正确答案也能解决我的问题。因此,只需用您最喜欢的语言假设这个问题即可。)
我想编写一个函数,将 OCaml 中的任意程序作为字符串,并确定该程序是否正确或不正确,以及在后一种情况下,我是否可以通过在末尾连接适当的字符来使其成为正确的程序。
我假设某处有该语言的编译器,我可以应用它并得到答复说“编译”或“不编译 - 第 X 行,字符 Y 出现错误”(大多数情况都是如此)无论如何语言)。总之,我想要一个函数,它接受一个程序并返回:
- 正确 - 如果字符串包含正确的程序;
- 错误——如果字符串包含不正确的程序,无论您如何将字符连接到它,都永远不会变成正确的;
- 不完整——如果字符串包含不正确的程序,但不是错误的。
例如,OCaml 程序 let x = f
是不正确的,因为 f
在使用时尚未定义。而且它不能继续下去,因为无论你在 f 之后写什么,都将始终是一些之前没有定义过的标识符。程序 let x =
也是不正确的;但如果我们扩展到 let x = 5
那么我们就有了一个完全有效的程序。因此,我的函数应该在第一种情况下返回 Erroneous,在第二种情况下返回 Incomplete。
如果我们有这个程序,事情可能会变得很棘手,
let ans = 5
let x = a
因为我的函数必须看到,如果我用 ns
继续该程序,那么该程序就会变得正确。
我的问题是:你认为可以编写这样的函数/算法吗?如果是这样,总体思路是什么?如果不是,请尝试让我相信事实并非如此。
(我会对任何见解或部分答案感到满意,例如暗示不完整的东西。例如,我相信如果语言编译器说第 3 行有错误并且程序有 100 行,那么就不可能继续的程序。)
(The following question concerns the OCaml language and has examples in OCaml, but the question is very general and probably a correct answer for any other computer language will solve my problem too. So, just assume this question in your favourite language.)
I want to write a function which takes an arbitrary program in OCaml as a string and decides if the program is correct or incorrect and, in the latter case, whether or not I can make it into a correct one by concatenating appropriate characters at the end.
I'm assuming there is a compiler of the language somewhere and that I can apply it and get a reply saying either "Compiles" or "Doesn't compile -- error at line X, character Y" (as is the case with most languages anyway). In summary, I'd like to have a function which takes a program and returns:
- Correct -- if the string contains a correct program;
- Erroneous -- if the string contains an incorrect program which, no matter how you concatenate characters to it, will never become Correct;
- Incomplete -- if the string contains an incorrect program which is not Erroneous.
For example, the OCaml program let x = f
is incorrect because f
has not been defined by the time it is used. And it can't be continued, because whatever you write after f will always be some identifier which hasn't been defined before. The program let x =
is also incorrect; but if we extend to let x = 5
then we have a perfectly valid program. So, my function should return Erroneous in the first case and Incomplete in the second.
Things may become tricky if we have the program
let ans = 5
let x = a
because my function would have to see that if I continue the program with ns
then the program becomes correct.
My question is: do you think it is possible to write such function/algorithm? If so, what's the general idea? If not, try to convince me that it's not.
(I'll be happy with any insights or partial answers, e.g. something which implies Incomplete. For example, I believe that if the language compiler says there is an error at line 3 and the program has 100 lines, then there is no possible continuation of the program.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在您的第一个示例中,
let x = f
,如果我添加un y ->; y ?
我认为你想要的是可能的,但目前的工具是不可能的。如果您只对语法正确性感兴趣,基本思想是运行解析器/词法分析器,如果引发错误则返回“错误”,如果没有返回完整的 AST,则返回“不完整”,但没有错误(所以它仍在等待更多输入)。
注意:仍然存在一个小的不匹配,因为词法分析器将在 EOF 之前返回一个标记,这本来可以继续。您不需要将该标记视为完整的标记,并在此时进行一些更精细的推理。更一般地说,您输入的极值将需要专门的推理,我在此不进行介绍。
使词法分析/解析阶段变得容易的属性是词法分析器是由解析器需求驱动的——它只读取解析器推理令牌流所需的内容——并且解析器是“严格的” ,或者提前失败,而不是在失败站点询问更多信息。
程序正确性的后一个主要阶段是标识符解析(这个变量名指的是什么?)和类型系统——还有其他标准,例如检查构造函数和类型名称的数量,但它们并不是很重要。有趣的说法。你的问题。这些通常不是以需求驱动的风格编写的,或者更一般地不尝试推理部分信息。应该可以通过这种方式重新设计它们,但这可能需要很多努力。
一个好的方向是“增量解析器”。许多人已经认识到与编辑器相关的程序工具(程序是增量编写的)需要增量。他们解决了在源代码发生具体更改后更新抽象信息的更普遍的问题;不仅是“最后添加”的更改,而且是更一般的更改。他们的工具可能也可以解决您的问题。
编辑:啊,我终于找到了我要找的东西。您应该看看 Oleg 的区分解析器。
In your first example,
let x = f
, what if I addun y -> y
?I think what you want is possible, but no with the current tools. If you were only interested in syntactic correctness, the basic idea would be to run the parser/lexer, return "erroneous" if it raises an error, and "incomplete" if it hasn't returned a complete AST, but no error (so it is still waiting for more input).
Note : there is still a small mismatch as the lexer will return a token just before EOF, which could have been continued. You would need not to consider that token as a complete token and do some finer reasoning at that point. More generally, the extremum of your input will require specialized reasoning that I don't cover here.
The properties that makes it easy at the lexing/parsing stage is that the lexer is demand-driven by the parser -- it only reads as far as necessary for the parser to reason about the token stream -- and the parser is "strict", or fails early, instead of asking for more information at a failure site.
The latter major phase of a program correctness are the identifier resolution (what does this variable name refer to ?) and the type system -- there are other criterions, such as checking the arity of constructors and type names, but they're not very interesting wrt. you problem. Those are generally not written in a demand-driven style, or do not more generally attempt to reason about partial information. It should be possible to redesign them in this way, but this would probably require a lot of effort.
A good direction to go would be "incremental parsers". A lot of people have recognized the need for incrementality in program tools related to an editor (where programs are incrementally written). They tackle the more general problem of updating the abstract information after a concrete change in the source code; not only a change "added at the end", but more general kind of changes. Their tools could probably solve your problem too.
Edit: Ah, I finally found what I was looking for. You should have a look at Oleg's differentiating parsers.