递归下降解析和抽象语法树
我正在硬编码一个递归体面的解析器,主要是为了学习目的,但我遇到了一些麻烦。
我将使用CSS3语法的此简短摘录为例:
simple_selector = type_selector | universal;
type_selector = [ namespace_prefix ]? element_name;
namespace_prefix = [ IDENT | '*' ]? '|';
element_name = IDENT;
universal = [ namespace_prefix ]? '*';
首先,我没有意识到namespace_prefix
是type> type_selector
和的可选部分。通用的。这导致了
type_selector
在FED输入(如*|*
)时总是失败,因为对于与namespace_prefix
生产的任何输入相盲目考虑。
递归体面足够简单,但我对它的理解是,在确定产生式之前,我需要做很多(由于缺乏更好的词)探索性递归。因此,我更改了作品的签名以返回布尔值。这样,我可以轻松判断特定的生产是否成功。
我使用链表数据结构来支持任意前瞻,并且可以轻松地对该列表进行切片以尝试生成,然后在生成不成功时返回到我的起点。但是,在尝试生产时,我正在沿着可变状态,试图构建文档对象模型。这并不是真正的工作,因为我无法知道生产是否成功。如果制作不成功,我需要以某种方式撤消所做的任何更改。
我的问题是这样的。我应该使用抽象语法树作为中间表示,然后从那里开始吗?这是您通常会做的事情来解决这个问题吗?因为问题似乎主要在于文档对象模型不是适合递归的树数据结构。
I'm hard-coding a recursive decent parser, mostly for learning purposes and, I've run into some trouble.
I'll use this short excerpt from the CSS3 grammar as an example:
simple_selector = type_selector | universal;
type_selector = [ namespace_prefix ]? element_name;
namespace_prefix = [ IDENT | '*' ]? '|';
element_name = IDENT;
universal = [ namespace_prefix ]? '*';
First, I didn't realize that namespace_prefix
was an optional part within both the type_selector
and universal
. That led to the type_selector
always failing when fed input like *|*
because it was blindly being considered for any input that matched the namespace_prefix
production.
Recursive decent is straightforward enough but my understanding of it is that I need to do a lot of (for lack of better word) exploratory recursion before settling on a production. So I changed the signature of my productions to return Boolean values. This way I could easily tell whether a specific production resulted in success or not.
I use a linked list data structure to support arbitrary look-ahead, and can easily slice this list to attempt a production and then return to my starting point if the production doesn't succeed. However, while trying out a production, I'm passing along mutable state, trying to construct a document object model. This isn't really working out because I have no way of knowing whether the production will be successful or not. And if the production isn't successful, I need to somehow undo any changes made.
My question is this. Should I use an abstract syntax tree as an intermediate representation and then go from there? Is this something you would commonly do to work around this problem? Because the issue seems to be primarily with the document object model not being a suitable tree data structure for recursion.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我对 CSS 不太熟悉,但一般来说,您要做的就是重构语法以尽可能消除歧义。在您的例子中,可以位于 type_selector 和 universal 开头的 namespace_prefix 产生式可以作为单独的可选产生式在前面拉出:
但是,并非所有语法都可以像这样简单的前瞻进行简化,并且对于那些您可以使用更复杂的移位归约解析器,或者 - 正如您所建议的 - 回溯。对于回溯,您通常只是尝试解析产生式并记录语法的路径。一旦您有了与输入匹配的产生式,您就可以使用记录的路径来实际执行该产生式的语义操作。
I'm not intimately familiar with CSS, but in general what you would do is refactor the grammar to eliminate ambiguities as much as you can. In your case here, the namespace_prefix production that can be at the beginning of both type_selector and universal can be pulled out in front as a separate optional production:
Not all grammars can be simplified for simple look-ahead like this, though, and for those you can use more complicated shift-reduce parsers, or - as you suggest - backtracking. For backtracking, you usually just attempt to parse productions and record the path through the grammar. Once you have a production that matches the input you use the recorded path to actually perform the semantic action for that production.