为什么自下而上的解析比自上而下的解析更常见?

发布于 2024-10-05 23:39:05 字数 179 浏览 0 评论 0原文

看起来递归下降解析器不仅解释起来最简单,而且设计和维护起来也是最简单的。它们并不局限于 LALR(1) 语法,代码本身也可以被普通人理解。相比之下,自下而上的解析器对其能够识别的语法有限制,并且需要通过特殊工具生成(因为驱动它们的表几乎不可能手动生成)。

那么为什么自下而上(即移位归约)解析比自上而下(即递归下降)解析更常见?

It seems that recursive-descent parsers are not only the simplest to explain, but also the simplest to design and maintain. They aren't limited to LALR(1) grammars, and the code itself can be understood by mere mortals. In contrast, bottom up parsers have limits on the grammars they are able to recognize, and need to be generated by special tools (because the tables that drive them are next-to-impossible to generate by hand).

Why then, is bottom-up (i.e. shift-reduce) parsing more common than top-down (i.e. recursive descent) parsing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

简单气质女生网名 2024-10-12 23:39:05

如果您选择功能强大的解析器生成器,则可以编写语法而不必担心特殊属性。 (LA)LR 意味着你不必担心左递归,少了一个头痛。 GLR 意味着您不必担心局部歧义或前瞻。

自下而上的解析器往往非常高效。因此,一旦您付出了一些复杂机器的代价,编写语法就会更容易,而且解析器也会表现良好。

您应该期望在任何经常出现某种编程构造的地方看到这种选择:如果它更容易指定,并且执行得很好,即使机器很复杂,复杂的机器也会获胜。再举一个例子,尽管您可以自己手动构建索引文件,但数据库世界已经转向关系工具。编写数据模式更容易,指定索引更容易,并且后面有足够复杂的机制(您不必查看齿轮,您只需使用它们),它们可以非常快,几乎不费力。同样的原因。

If you choose a powerful parser generator, you can code your grammar without worrying about peculiar properties. (LA)LR means you don't have to worry about left recursion, one less headache. GLR means you don't have to worry about local ambiguity or lookahead.

And the bottom-up parsers tend to be pretty efficient. So, once you've paid the price of a bit of complicated machinery, it is easier to write grammars and the parsers perform well.

You should expect to see this kind of choice wherever there is some programming construct that commonly occurs: if it is easier to specify, and it performs pretty well, even if the machinery is complicated, complex machinery will win. As another example, the database world has gone to relational tools, in spite of the fact that you can hand-build an indexed file yourself. It's easier to write the data schemas, it's easier to specify the indexes, and with complicated enough machinery behind (you don't have to look at the gears, you just use them), they can be pretty fast with almost no effort. Same reasons.

何以畏孤独 2024-10-12 23:39:05

它源于几个不同的事情。

BNF(以及语法等理论)来自计算语言学:研究自然语言解析的人们。 BNF 是一种非常有吸引力的描述语法的方式,因此很自然地想要使用这些符号来生成解析器。

不幸的是,自上而下的解析技术在应用于此类符号时往往会失败,因为它们无法处理许多常见情况(例如,左递归)。这给你留下了 LR 系列,它性能良好并且可以处理语法,并且由于它们是由机器生成的,谁关心代码是什么样的?

不过,你是对的:自上而下的解析器工作得更“直观”,因此它们更容易调试和维护,一旦你进行了一些练习,它们就和工具生成的解析器一样容易编写。 (特别是当您进入移位/减少冲突地狱时。)许多答案都谈到解析性能,但实际上,自上而下的解析器通常可以优化为与机器生成的解析器一样快。

这就是为什么许多生产编译器使用手写的词法分析器和解析器。

It stems from a couple different things.

BNF (and the theory of grammars and such) comes from computational linguistics: folks researching natural language parsing. BNF is a very attractive way of describing a grammar, and so it's natural to want to consume these notation to produce a parser.

Unfortunately, top-down parsing techniques tend to fall over when applied to such notations, because they cannot handle many common cases (e.g., left recursion). This leaves you with the LR family, which performs well and can handle the grammars, and since they're being produced by a machine, who cares what the code looks like?

You're right, though: top-down parsers work more "intuitively," so they're easier to debug and maintain, and once you have a little practice they're just as easy to write as those generated by tools. (Especially when you get into shift/reduce conflict hell.) Many of the answers talk about parsing performance, but in practice top-down parsers can often be optimized to be as fast as machine-generated parsers.

That's why many production compilers use hand-written lexers and parsers.

长不大的小祸害 2024-10-12 23:39:05

递归下降解析器尝试假设输入字符串的一般结构,这意味着在到达字符串末尾之前会发生大量的试错。这使得它们比自下而上的解析器效率低,因为自下而上的解析器不需要这样的推理引擎。

随着语法复杂性的增加,性能差异变得更大。

Recursive-descent parsers try to hypothesize the general structure of the input string, which means a lot of trial-and-error occurs before the end of the string is reached. This makes them less efficient than bottom-up parsers, which need no such inference engines.

The performance difference becomes magnified as the complexity of the grammar increases.

半﹌身腐败 2024-10-12 23:39:05

为了补充其他答案,重要的是要认识到,除了效率之外,自下而上的解析器可以比递归下降解析器接受明显更多的语法。自上而下的解析器(无论是否是预测性的)只能有 1 个先行标记,并且如果可以使用两种不同的规则导出当前标记和紧随该标记的任何内容,则会失败。当然,您可以实现解析器以具有更多的前瞻功能(例如 LL(3)),但是在它变得像自下而上的解析器一样复杂之前,您愿意将其推进到什么程度?另一方面,自下而上的解析器(特别是 LALR)维护一个 firstfollows 的列表,并且可以处理自上而下的解析器无法处理的情况。

当然,计算机科学是关于权衡的。如果语法足够简单,那么编写一个自上而下的解析器是有意义的。如果它很复杂(例如大多数编程语言的语法),那么您可能必须使用自下而上的解析器才能成功接受输入。

To add to other answers, it is important to realize that besides efficiency, bottom-up parsers can accept significantly more grammars than recursive-descent parsers do. Top-down parsers -whether predictive or not- can only have 1 lookahead token and fail if current token and whatever immediately follows the token can be derived using two different rules. Of course, you could implement the parser to have more lookaheads (e.g. LL(3)), but how far are you willing to push it before it becomes as complex as a bottom-up parser? Bottom-up parsers (LALR specifically) on the other hand, maintain a list of firsts and follows and can handle cases where top-down parsers can't.

Of course, computer science is about trade-offs. If you grammar is simple enough, it makes sense to write a top-down parser. If it's complex (such as grammars of most programming languages), then you might have to use a bottom-up parser to successfully accept the input.

灼痛 2024-10-12 23:39:05

我有两个猜测,尽管我怀疑是否能完全解释它:

  1. 自上而下的解析可能很慢。递归下降解析器可能需要指数时间才能完成其工作。这将严重限制使用自顶向下解析器的编译器的可扩展性。

  2. 更好的工具。如果您可以用 EBNF 的某种变体来表达该语言,那么您很可能可以通过 Lex/Yacc 来跳过大量繁琐的代码。似乎没有那么多工具可以帮助自动化组装自顶向下解析器的任务。让我们面对现实吧,打磨解析器代码并不是玩弄语言的乐趣所在。

I have two guesses, though I doubt that either fully explains it:

  1. Top-down parsing can be slow. Recursive-descent parsers may require exponential time to complete their job. That would put severe limitations on the scalability of a compiler which uses a top-down parser.

  2. Better tools. If you can express the language in some variant of EBNF, then chances are you can Lex/Yacc your way past a whole lot of tedious code. There don't seem to be as many tools to help automate the task of putting together a top-down parser. And let's face it, grinding out parser code just isn't the fun part of toying with languages.

抽个烟儿 2024-10-12 23:39:05

我从未见过自上而下和移位归约解析器之间的真正比较:

只有两个小程序同时并行运行,一个使用自上而下的方法,另一个使用自下而上的方法,每个程序大约 200 行代码,

能够解析任何类型的自定义二元运算符和数学表达式,两者共享相同的语法声明格式,然后可能添加变量声明和效果以显示如何“破解”(非上下文无关)可以实施。

那么,如何诚实地谈论我们从未做过的事情:严格比较两种方法?

I never saw a real comparison between top-down and shift-reduce parser :

just 2 small programs ran in the same time side-by-side, one using the top-down approach and the over one using the bottom-up approach, each of about ~200 lines of code,

able to parse any kind of custom binary operator and mathematical expression, the two sharing the same grammar declaration format, and then possibly adding variables declarations and affectations to show how 'hacks' (non context-free) can be implemented.

so, how to speak honestly of something we never did : comparing rigorously the two approach ?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文