如何用基于语法的解析器替换宏?

发布于 2024-12-04 13:51:00 字数 869 浏览 2 评论 0原文

我需要一个用于异国编程语言的解析器。我为它编写了一个语法,并使用解析器生成器(PEGjs)来生成解析器。这工作得很好......除了一件事:宏(用预定义的文本替换占位符)。我不知道如何将其整合到语法中。让我来说明这个问题:

要解析的示例程序通常如下所示:

instructionA parameter1, parameter2
instructionB parameter1
instructionC parameter1, parameter2, parameter3

到目前为止没有问题。但该语言也支持宏:

Define MacroX { foo, bar }
instructionD parameter1, MacroX, parameter4

Define MacroY(macroParameter1, macroParameter2) {
  instructionE parameter1, macroParameter1
  instructionF macroParameter2, MacroX
}

instructionG parameter1, MacroX
MacroY

当然,我可以定义一个语法来识别宏和对宏的引用。但在这种情况下,我不知道如何解析宏的内容,因为不清楚宏包含什么。它可以只是一个参数(这是最简单的),但也可以是一个宏中的多个参数(如我的示例中的 MacroX,它代表两个参数)或一整块指令(如 MacroY)。宏甚至可以包含其他宏。如果不清楚宏的语义是什么,如何将其放入语法中?

最简单的方法似乎是首先运行预处理器来替换所有宏,然后才运行解析器。但在这种情况下,行号就会变得混乱。如果存在解析错误,我希望解析器生成包含行号的错误消息。如果我对输入进行预处理,则行号不再对应。

非常感谢帮助。

I need a parser for an exotic programming language. I wrote a grammar for it and used a parser generator (PEGjs) to generate the parser. That works perfectly... except for one thing: macros (that replace a placeholder with predefined text). I don't know how to integrate this into a grammar. Let me illustrate the problem:

An example program to be parsed typically looks like this:

instructionA parameter1, parameter2
instructionB parameter1
instructionC parameter1, parameter2, parameter3

No problem so far. But the language also supports macros:

Define MacroX { foo, bar }
instructionD parameter1, MacroX, parameter4

Define MacroY(macroParameter1, macroParameter2) {
  instructionE parameter1, macroParameter1
  instructionF macroParameter2, MacroX
}

instructionG parameter1, MacroX
MacroY

Of course I could define a grammar to identify Macros and references to Macros. But in that case I don't know how I would parse the contents of a Macro, because it's not clear what the macro contains. It could be just one parameter (that's easiest), but it could also be several parameters in one macro (like MacroX in my example, which represents two parameters) or a whole block of instructions (like MacroY). And Macros can even contain other Macros. How do I put this into a grammar if it's not clear what the macro is semantically?

The easiest approach seems to be to run a preprocessor first to replace all the macros and only then run the parser. But in that case the line numbers get messed up. I want the parser to generate error messages containing the line number if there is a parse error. And if I preprocess the input, the line numbers do not correspond anymore.

Help very much appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

酷到爆炸 2024-12-11 13:51:00

宏处理器往往不尊重语言元素的边界;本质上,他们(通常)可以对明显的输入字符串进行任意更改。

如果是这种情况,您别无选择:您需要构建一个宏处理器,它可以保留行号。

如果宏始终包含结构良好的语言元素,并且它们始终出现在代码中的结构化位置,那么您可以添加宏定义的概念并调用您的语法。这可能会使你的解析不明确; C代码中的foo(x)可能是宏调用,也可能是函数调用。你必须以某种方式解决这种歧义。 C 解析器过去通过在解析时收集符号表信息来解决此类歧义问题;如果您在解析时收集 is-foo-a-macro,那么您可以确定 foo(x) 是否是宏调用。

Macro processors tend not to respect the boundaries of language elements; in essence, they (often) can make arbitrary changes to the apparant input string.

If this is the case, you have little choice: you'll need to build a macro processor, that can preserve the line numbers.

If the macros always contain well-structured language elements, and they always occur in structured places in the code, then you can add the notion of a macro definition and call to your grammar. This may make your parses ambiguous; foo(x) in C code might be macro call, or it might be a function call. You'll have to resolve that ambiguity somehow. C parsers used to solve such ambiguity problems by collecting symbol table information as they parsed; if you collect is-foo-a-macro as you parse, then you can determine that foo(x) is a macro call or not.

梦中楼上月下 2024-12-11 13:51:00

使用 PEG,您必须手动定义可以检查宏扩展的位置。您可以将宏添加到哈希中并在 PEG 规则中检查它,该规则允许宏(中缀 expr、后缀 expr、unop、binop、函数调用等)。它并不像在 lisp 中那么容易,但比使用 YACC 及其运算符优先级黑客要容易得多:)

其他已知的 PEG 框架允许宏,如 parrot、perl6、katahdin 或 PFront 使用该技巧在运行时执行解析,因此根据业绩进行交易。
或者您可以同时执行这两种操作并允许预编译和解释 PEG 解析。有几个项目考虑到了这一点,但你需要一个快速的虚拟机,比如 luajit、java、clr 或朋友。

我使用特殊的语法块关键字通过外部预编译 PEG 解析器加载外部共享库。例如,将 SQL 或 FFI 声明解析为 AST。
但您也可以需要 C 编译器并在运行时编译所有宏的解析。

With PEG you have to manually define the places where you can check for macro extensions. You can add your macro to a hash and check for it in the PEG rule(s), which do allow macros (infix expr, postfix expr, unop, binop, function call, ...). It's not so easy as in lisp, but much easier than with YACC and its operator precedence hacks :)

Other known PEG frameworks which allow macros, like parrot, perl6, katahdin or PFront use the trick to execute the parse at run-time, thus trading against performance.
Or you can do both and allow pre-compiled and interpreted PEG parsing. There are several projects which thought about that, but you need a fast VM, like luajit, java, clr or friends.

I use special syntax block keywords to load external shared libraries with the external pre-compiled PEG parser. E.g. to parse SQL or FFI declarations into your AST.
But you can also require a C compiler and compile the parse at run-time for all macros.

牛↙奶布丁 2024-12-11 13:51:00

使用 PEG 比其他任何方法都容易得多。首先,基于 Packrat 的解析器等都是可扩展的。你的宏定义可以修改语法,所以下次使用它时,它会自然地被解析。请参阅此处这里是这种方法的一些极端例子。

另一种可能性是链接解析器,这对于基于 PEG 的方法来说也是微不足道的。

With PEG it is significantly easier than with anything else. First of all, Packrat-based parsers and alike are extensible. Your macro definition can modify the syntax, so the next time it is used it will be parsed naturally. See here and here some extreme examples of this approach.

Another possibility is to chain parsers, which is also trivial with PEG-based approaches.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文