将词法分析器与许多解析器相结合
我知道词法分析器和解析器的典型配置,其中词法分析器读取源代码并生成标记,然后将其定向到解析器,解析器将它们用作语法生成中的终端符号。在典型的递归下降解析器中,首先调用一些表示起始非终结符的顶级函数,然后该函数调用其他函数并从词法分析器中逐个读取令牌。
但是,如果我需要在同一个词法分析器之上使用两个不同的解析器怎么办?
我的意思是,它们都从同一个地方读取,因为我不想多次读取同一个源,即不允许多次传递,以避免词法分析器中不必要的重复工作。我只是希望当序列中的下一个标记刚刚生成时,两个解析器同时使用它。
但我只能在其中一个解析器中调用一个顶级函数;不能同时调用两者:/
是否有某种方法可以以某种步进模式运行这些解析器? 也就是说,当我得到一个新的令牌时,我想将它一个接一个地传递给两个解析器,但只是将它们前进一个令牌,尽可能更新它们的内部状态和数据结构,然后立即返回等待另一个令牌。
我以前从未见过这样的配置。是否有可能以这种方式构建一个解析器?是否有一些关于如何在代码中构造这种解析器的材料?它有名字吗?
编辑1: 我不想使用任何解析器生成工具,而是自己编写代码,因为我想了解这种东西内部是如何工作的。
I know a typical configuration of lexer and parser, where the lexer reads the source code and generates tokens, which are then directed to the parser, and the parser uses them as terminal symbols in its grammar productions. In a typical recursive-descent parser, you start by calling some top-level function representing the starting nonterminal, and this function call others and reads token by token from the lexer.
But what if I need two different parsers on top of the same lexer?
I mean, both of them reading from the same place, because I don't want to read the same source multiple times, that is, no multiple passes allowed, to avoid unnecessary duplicating work in the lexer. I just want it that when the next token in sequence have just been generated, both parsers consume it at the same time.
But I can call only one top-level function in one of these parsers; can't call both at the same time :/
Is there some way to run these parsers in some kind of a step mode?
That is, when I've got a new token, I want to pass it to both parsers one after another, but only to advance them by that one token, update their internal states and data structures as far as they can, and return immediately to wait for another token.
I haven't seen any configuration of this kind never before. Is it possible at all to build a parser that way? Are there some materials about how this kind of parser could be structured in code? Has it any name?
EDIT 1:
I don't want to use any parser generator tool, but write the code myself, because I want to learn how this kind of stuff works internally.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您描述了拉解析器的典型流程。它被调用一次并获得控制权,直到所有输入都被完全解析。解析器自行调用词法分析器来获取下一个标记。另一方面,每次有新令牌可用时都会调用推送解析器。因此,您可以为每个新标记调用多个解析器。 Classical Bison 可以在推送模式下使用(详细信息那里)。 Lemon 解析器生成器生成推送解析器。
You described typical flow of a pull parser. It is called once and it takes control until all its input is completely parsed. Parser calls lexer by itself to get next token. A push parser, on the other hand, is called each time a new token is made available. So you can call several parsers for every new token. Classical Bison can be used in push mode (details are there). Lemon parser generator generates push parsers.