解析的部分评估

发布于 07-12 13:12 字数 583 浏览 10 评论 0原文

我正在研究Python的宏系统(如此处讨论)以及我的其中一件事一直在考虑的是计量单位。 尽管测量单位可以在没有宏的情况下或通过静态宏来实现(例如提前定义所有单位),但我正在考虑允许在运行时动态扩展语法的想法。

为此,我正在考虑在编译时对代码使用某种部分评估。 如果给定表达式的解析失败,由于其语法的宏不可用,编译器将停止对函数/块的评估,并生成它已经具有的代码,其中包含未知表达式所在的存根。 当在运行时命中此存根时,该函数将根据当前宏集重新编译。 如果编译失败,则会抛出解析错误,因为执行无法继续。 如果编译成功,新函数将替换旧函数并继续执行。

我看到的最大问题是,在受影响的代码运行之前,您无法找到解析错误。 然而,这不会影响很多情况,例如像 []、{}、() 和 `` 这样的组运算符仍然需要配对(我的分词器/列表解析器的要求),以及像类和函数这样的顶级语法不会受到影响,因为它们的“运行时”实际上是加载时间,其中评估语法并生成它们的对象。

除了我上面描述的实现难度和问题之外,这个想法还存在哪些问题呢?

I'm working on a macro system for Python (as discussed here) and one of the things I've been considering are units of measure. Although units of measure could be implemented without macros or via static macros (e.g. defining all your units ahead of time), I'm toying around with the idea of allowing syntax to be extended dynamically at runtime.

To do this, I'm considering using a sort of partial evaluation on the code at compile-time. If parsing fails for a given expression, due to a macro for its syntax not being available, the compiler halts evaluation of the function/block and generates the code it already has with a stub where the unknown expression is. When this stub is hit at runtime, the function is recompiled against the current macro set. If this compilation fails, a parse error would be thrown because execution can't continue. If the compilation succeeds, the new function replaces the old one and execution continues.

The biggest issue I see is that you can't find parse errors until the affected code is run. However, this wouldn't affect many cases, e.g. group operators like [], {}, (), and `` still need to be paired (requirement of my tokenizer/list parser), and top-level syntax like classes and functions wouldn't be affected since their "runtime" is really load time, where the syntax is evaluated and their objects are generated.

Aside from the implementation difficulty and the problem I described above, what problems are there with this idea?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

荭秂2024-07-19 13:12:24

以下是一些可能出现的问题:

  • 您可能会发现在出现问题时很难向用户提供有用的错误消息。 这似乎是可能的,因为任何编译时语法错误都可能只是语法扩展。
  • 性能受到打击。

我试图找到一些关于 Perl 6 中动态解析的优点、缺点和/或实现的讨论,但我找不到任何合适的内容。 然而,您可能会发现 Nicklaus Wirth(Pascal 和其他语言的设计者)的这句话很有趣:

计算机科学家的幻想
20 世纪 60 年代,人们的生活没有界限。 被拒绝
通过自动语法的成功
分析和解析器生成,一些
提出了灵活的想法,或者
至少是可扩展的语言。 这
的想法是一个程序将是
前面是句法规则
然后将指导通用解析器
在解析后续程序的同时。
更进一步:语法规则将
不仅在程序之前,而且它们
可以散布在任何地方
贯穿全文。 例如,如果
有人希望使用一个特别的
for 语句的奇特私有形式,
他可以优雅地做到这一点,甚至
指定不同的变体
同一概念在不同部分
同一个程序。 这个概念是
语言是用来沟通的
人类已经完全融合
出来,显然现在每个人都可以
即时定义他自己的语言。
然而,寄予厚望的希望很快就实现了
因遇到的困难而沮丧
当试图指定这些是什么时
私人建筑应该意味着。 作为
结果,有趣的想法
可扩展语言反而逐渐消失
快点。

编辑:这是 Perl 6 的 概要 6 :子例程,不幸的是采用标记形式,因为我找不到更新的格式化版本; 在其中搜索“宏”。 不幸的是,它不太有趣,但您可能会发现一些相关的东西,例如 Perl 6 的一次性解析规则,或其抽象语法树的语法。 Perl 6 采用的方法是,宏是一个函数,在解析其参数后立即执行并返回 AST 或字符串; Perl 6 继续解析,就好像源实际上包含返回值一样。 有人提到了错误消息的生成,但它们看起来好像如果宏返回 AST,你就可以做得到。

Here are a few possible problems:

  • You may find it difficult to provide the user with helpful error messages in case of a problem. This seems likely, as any compilation-time syntax error could be just a syntax extension.
  • Performance hit.

I was trying to find some discussion of the pluses, minuses, and/or implementation of dynamic parsing in Perl 6, but I couldn't find anything appropriate. However, you may find this quote from Nicklaus Wirth (designer of Pascal and other languages) interesting:

The phantasies of computer scientists
in the 1960s knew no bounds. Spurned
by the success of automatic syntax
analysis and parser generation, some
proposed the idea of the flexible, or
at least extensible language. The
notion was that a program would be
preceded by syntactic rules which
would then guide the general parser
while parsing the subsequent program.
A step further: The syntax rules would
not only precede the program, but they
could be interspersed anywhere
throughout the text. For example, if
someone wished to use a particularly
fancy private form of for statement,
he could do so elegantly, even
specifying different variants for the
same concept in different sections of
the same program. The concept that
languages serve to communicate between
humans had been completely blended
out, as apparently everyone could now
define his own language on the fly.
The high hopes, however, were soon
damped by the difficulties encountered
when trying to specify, what these
private constructions should mean. As
a consequence, the intreaguing idea of
extensible languages faded away rather
quickly.

Edit: Here's Perl 6's Synopsis 6: Subroutines, unfortunately in markup form because I couldn't find an updated, formatted version; search within for "macro". Unfortunately, it's not too interesting, but you may find some things relevant, like Perl 6's one-pass parsing rule, or its syntax for abstract syntax trees. The approach Perl 6 takes is that a macro is a function that executes immediately after its arguments are parsed and returns either an AST or a string; Perl 6 continues parsing as if the source actually contained the return value. There is mention of generation of error messages, but they make it seem like if macros return ASTs, you can do alright.

迟月2024-07-19 13:12:24

更进一步,您可以进行“惰性”解析,并且始终只解析足以评估下一条语句的内容。 就像某种即时解析器。 然后语法错误可能会变成正常的运行时错误,只会引发可以由周围代码处理的正常异常:

def fun():
   not implemented yet

try:
  fun()
except:
  pass

这将是一个有趣的效果,但它是否有用或理想是一个不同的问题。 一般来说,即使您现在不调用代码,了解错误也是有好处的。

在控制权到达宏之前不会对宏进行求值,并且解析器自然会知道所有先前的定义。 此外,宏定义甚至可以使用程序迄今为止计算出的变量和数据(例如为先前计算的列表中的所有元素添加一些语法)。 但是,开始为通常可以直接用语言完成的事情编写自修改程序可能是一个坏主意。 这可能会让人感到困惑......

在任何情况下,您都应该确保只解析代码一次,如果第二次执行它,则使用已经解析的表达式,这样就不会导致性能问题。

Pushing this one step further, you could do "lazy" parsing and always only parse enough to evaluate the next statement. Like some kind of just-in-time parser. Then syntax errors could become normal runtime errors that just raise a normal Exception that could be handled by surrounding code:

def fun():
   not implemented yet

try:
  fun()
except:
  pass

That would be an interesting effect, but if it's useful or desirable is a different question. Generally it's good to know about errors even if you don't call the code at the moment.

Macros would not be evaluated until control reaches them and naturally the parser would already know all previous definitions. Also the macro definition could maybe even use variables and data that the program has calculated so far (like adding some syntax for all elements in a previously calculated list). But this is probably a bad idea to start writing self-modifying programs for things that could usually be done as well directly in the language. This could get confusing...

In any case you should make sure to parse code only once, and if it is executed a second time use the already parsed expression, so that it doesn't lead to performance problems.

悲欢浪云2024-07-19 13:12:24

以下是我硕士论文中的一些想法,可能有帮助,也可能没有帮助。
这篇论文是关于自然语言的稳健解析。
主要思想:给定一种语言的上下文无关语法,尝试解析给定的
文本(或者,在你的情况下,是一个Python程序)。 如果解析失败,您将获得部分生成的解析树。 使用树结构建议新的语法规则,以更好地覆盖已解析的文本。
我可以把我的论文寄给你,但除非你读希伯来语,否则这可能没什么用。

简而言之:
我使用了 自下而上 图表解析器。 这种类型的解析器根据语法生成产生式的边。 每条边都标有树中被消耗的部分。 每条边根据其与完全覆盖的接近程度获得分数,例如:

S -> NP . VP

得分为二分之一(我们成功覆盖了 NP,但没有覆盖 VP)。
得分最高的边表明新规则(例如X→NP)。
一般来说,图表解析器的效率低于常见的 LALR 或 LL 解析器(通常用于编程语言的类型) - O(n^3) 而不是 O(n) 复杂度,但话又说回来,您正在尝试比只是解析现有的语言。
如果您可以根据这个想法做一些事情,我可以向您发送更多详细信息。
我相信查看自然语言解析器可能会给您一些其他想法。

Here are some ideas from my master's thesis, which may or may not be helpful.
The thesis was about robust parsing of natural language.
The main idea: given a context-free grammar for a language, try to parse a given
text (or, in your case, a python program). If parsing failed, you will have a partially generated parse tree. Use the tree structure to suggest new grammar rules that will better cover the parsed text.
I could send you my thesis, but unless you read Hebrew this will probably not be useful.

In a nutshell:
I used a bottom-up chart parser. This type of parser generates edges for productions from the grammar. Each edge is marked with the part of the tree that was consumed. Each edge gets a score according to how close it was to full coverage, for example:

S -> NP . VP

Has a score of one half (We succeeded in covering the NP but not the VP).
The highest-scored edges suggest a new rule (such as X->NP).
In general, a chart parser is less efficient than a common LALR or LL parser (the types usually used for programming languages) - O(n^3) instead of O(n) complexity, but then again you are trying something more complicated than just parsing an existing language.
If you can do something with the idea, I can send you further details.
I believe looking at natural language parsers may give you some other ideas.

机场等船2024-07-19 13:12:24

我考虑的另一件事是使其成为全面的默认行为,但允许语言(意味着一组宏来解析给定语言)在编译时抛出解析错误。 例如,我的系统中的 Python 2.5 就可以做到这一点。

不用存根的想法,只需重新编译那些在编译时执行时无法完全处理的函数即可。 这也将使自修改代码变得更容易,因为您可以修改代码并在运行时重新编译它。

Another thing I've considered is making this the default behavior across the board, but allow languages (meaning a set of macros to parse a given language) to throw a parse error at compile-time. Python 2.5 in my system, for example, would do this.

Instead of the stub idea, simply recompile functions that couldn't be handled completely at compile-time when they're executed. This will also make self-modifying code easier, as you can modify the code and recompile it at runtime.

捎一片雪花2024-07-19 13:12:24

您可能需要使用未知语法来分隔输入文本的位,以便可以解析语法树的其余部分,除了稍后将扩展的一些字符序列节点之外。 根据您的顶级语法,这可能没问题。

您可能会发现解析算法和词法分析器以及它们之间的接口都需要更新,这可能会排除大多数编译器创建工具。

(更常见的方法是使用字符串常量来实现此目的,可以在运行时将其解析为小型解释器)。

You'll probably need to delimit the bits of input text with unknown syntax, so that the rest of the syntax tree can be resolved, apart from some character sequences nodes which will be expanded later. Depending on your top level syntax, that may be fine.

You may find that the parsing algorithm and the lexer and the interface between them all need updating, which might rule out most compiler creation tools.

(The more usual approach is to use string constants for this purpose, which can be parsed to a little interpreter at run time).

枉心2024-07-19 13:12:24

我认为你的方法效果不会很好。 让我们举一个用伪代码编写的简单示例:

define some syntax M1 with definition D1
if _whatever_:
    define M1 to do D2
else:
    define M1 to do D3
code that uses M1

因此,有一个示例,如果您允许在运行时重新定义语法,则会出现问题(因为按照您的方法,使用 M1 的代码将由定义 D1 进行编译)。 请注意,验证是否发生语法重定义是不可判定的。 过度近似可以通过某种类型的类型系统或某种其他类型的静态分析来计算,但 Python 在这方面并不广为人知:D。

另一件让我困扰的事情是你的解决方案“感觉”不对。 我发现仅仅因为您可能能够在运行时解析它而存储您无法解析的源代码是邪恶的。

我想到的另一个例子是这样的:

...function definition fun1 that calls fun2...
define M1 (at runtime)
use M1
...function definition for fun2

从技术上讲,当你使用 M1 时,你无法解析它,因此你需要将程序的其余部分(包括 fun2 的函数定义)保留在源代码中。 当您运行整个程序时,您将看到对 fun2 的调用,即使它已定义,您也无法调用它。

I don't think your approach would work very well. Let's take a simple example written in pseudo-code:

define some syntax M1 with definition D1
if _whatever_:
    define M1 to do D2
else:
    define M1 to do D3
code that uses M1

So there is one example where, if you allow syntax redefinition at runtime, you have a problem (since by your approach the code that uses M1 would be compiled by definition D1). Note that verifying if syntax redefinition occurs is undecidable. An over-approximation could be computed by some kind of typing system or some other kind of static analysis, but Python is not well known for this :D.

Another thing that bothers me is that your solution does not 'feel' right. I find it evil to store source code you can't parse just because you may be able to parse it at runtime.

Another example that jumps to mind is this:

...function definition fun1 that calls fun2...
define M1 (at runtime)
use M1
...function definition for fun2

Technically, when you use M1, you cannot parse it, so you need to keep the rest of the program (including the function definition of fun2) in source code. When you run the entire program, you'll see a call to fun2 that you cannot call, even if it's defined.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文