LL 解析器比 LR 解析器有什么优势?
LL 解析器相对于 LR 解析器有哪些优势,以保证它们在当今的解析器生成器工具中相对受欢迎?
根据维基百科,LR 解析似乎比 LL 具有优势:
LR解析比LL解析可以处理更大范围的语言,并且在错误报告方面也更好,即当输入不符合语法时它会尽快检测到语法错误。这与 LL(k)(或更糟糕的是 LL(*) 解析器)形成对比,LL(k) 可能会由于回溯而将错误检测推迟到语法的不同分支,这通常会使错误更难在具有长公共前缀的析取中定位.
注意:这不是家庭作业。当我发现 Antlr 是一个 LL 解析器生成器时,我感到很惊讶(尽管它的名字中有“LR”!)。
What advantages do LL parsers have over LR parsers to warrant their relative popularity in today's parser generator tools?
According to Wikipedia, LR parsing appears to have advantages over LL:
LR parsing can handle a larger range of languages than LL parsing, and is also better at error reporting, i.e. it detects syntactic errors when the input does not conform to the grammar as soon as possible. This is in contrast to an LL(k) (or even worse, an LL(*) parser) which may defer error detection to a different branch of the grammar due to backtracking, often making errors harder to localize across disjunctions with long common prefixes.
Note: This is not homework. I was just surprised when I found out that Antlr is an LL parser generator (despite having "LR" in its name!).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
如果你想要一个解析树/森林并且不介意黑匣子,GLR 就很棒。它可以让你输入任何你想要的 CFG通过详尽的测试在解析时检查歧义的成本,而不是静态解决 LR/LALR 冲突。有人说这是一个很好的权衡。 Ira Baxter 的 DMS 工具或 Elkhound 具有免费的 C++ 语法,对于解决此类问题非常有用。 ANTLR 对于一大类语言应用程序也很有用,但使用自上而下的方法,生成递归下降解析器称为允许语义谓词的 LL(*)。我将在这里无需证明地声明谓词允许您解析 CFG 之外的上下文相关语言。程序员喜欢在语法中插入操作,喜欢良好的错误处理,喜欢单步调试。 LL在这三方面都擅长。 LL 是我们手工完成的,因此更容易理解。不要相信维基百科关于 LR 更擅长处理错误的废话。也就是说,如果您使用 ANTLR 进行大量回溯,则 LL(*) 的错误确实会更严重(PEG 有这个问题)。
重新回溯。 GLR 也进行推测(即回溯),就像 PEG、ANTLR 和任何其他非确定性策略一样。在任何非确定性 LR 状态下,GLR“分叉”子解析器来尝试任何可行的路径。不管怎样,LL 有很好的错误处理上下文。 LR 知道它与表达式匹配,LL 知道它是赋值或 IF 条件中的表达式; LR 知道它可能属于其中任何一个,但不确定 - 而这种不确定性正是它发挥作用的地方。
GLR 是
O(n^3)
最坏情况。 packrat/PEG 是O(n)
最坏情况。由于循环前瞻 DFA,ANTLR 的复杂度为O(n^2)
,但实际上却为O(n)
。真的没关系。 GLR 足够快。ANTLR是AN其他T工具,用于L和R识别,而不是反LR ,但我也喜欢那个;)
坦白说,像很多 80 年代的年轻程序员一样,我不理解 LALR,也不喜欢黑匣子(现在我挖掘 GLR 引擎的美妙之处,但仍然更喜欢 LL)。我构建了一个基于 LL(k) 的商业编译器,并决定构建一个工具来生成我手动构建的内容。 ANTLR 并不适合所有人,像 C++ 这样的边缘情况可能用 GLR 可以更好地处理,但很多人发现 ANTLR 适合他们的舒适区。自 2008 年 1 月以来,ANTLRWorks 中 ANTLR 的二进制 jar 和源 zip 的下载总数已达 134,000 次(根据 Google Analytics)。请参阅我们关于 LL(*) 的论文,其中包含大量经验数据。
GLR is great if you want a parse tree/forest and don't mind black boxes. It lets you type in whatever CFG you want at the cost of checking for ambiguities at parse time via exhaustive testing, instead of resolving LR/LALR conflicts statically. Some say that's a good trade-off. Ira Baxter's DMS tool or Elkhound, which has a free C++ grammar, are useful for this class of problem. ANTLR is useful for a large class of language applications too, but uses a top-down approach, generating recursive descent parsers called LL(*) that allow semantic predicates. I will state without proof here that predicates allow you to parse context-sensitive languages beyond CFGs. Programmers like to insert actions into grammars, like good error handling, and like to single-step debug. LL is good at all three. LL is what we do by hand so it's easier to understand. Don't believe the wikipedia nonsense about LR being better at handling errors. That said, if you backtrack a lot with ANTLR, errors are indeed worse with LL(*) (PEGs have this problem).
Re backtracking. GLR speculates (i.e. backtracks) too, just like PEGs, ANTLR, and any other non-deterministic strategy. At any non-deterministic LR state, GLR "forks" sub-parsers to try out any viable path. Anyway, LL has good context for error handling. Where LR knows it's matching an expression, LL knows it's an expression in an assignment or
IF
-conditional; LR knows it could be in either but isn't sure - and that uncertainty is where it gets its power.GLR is
O(n^3)
worst case. packrat/PEG isO(n)
worst case. ANTLR's areO(n^2)
due to cyclic lookahead DFA butO(n)
in practice. Doesn't matter really. GLR is fast enough.ANTLR is ANother Tool for Lang Recognition not anti-LR, but I like that one too ;)
Frankly, like a lot of young coders in 80s, I didn't understand LALR and didn't like black boxes (now I dig the beauty of the GLR engine but still prefer LL). I built a commercial LL(k) based compiler and decided to build a tool to generate what I had built by hand. ANTLR isn't for everyone and edge cases like C++ might be better handled with GLR but a lot of people find ANTLR fits into their comfort zone. Since Jan 2008, there have been 134,000 downloads of ANTLR's binary jar, within ANTLRWorks, and source zips total (according to Google Analytics). See our paper on LL(*) with lots of empirical data.
如果您必须手动编写代码,那么递归下降 (LL) 是您可以实际执行的操作;人们实际上无法手动构建 L(AL)R 解析器。
鉴于现代解析器生成器将为您处理所有解析器构造,并且空间不是什么大问题,我更喜欢 LR 解析器,因为您不必与语法作斗争以使它们对您的特定解析器生成器有效(没有“删除所有左递归”的愚蠢行为)。
事实上,我更喜欢 GLR 解析器,它几乎可以使用上下文无关语法解析任何内容。无需担心左递归。没有转移/减少冲突担忧。没有前瞻限制。
如果您想了解一个 GLR 解析引擎可以处理的语言范围(包括著名的难以使用 LL/LALR 解析的语言 C++),您可以查看 此处。
If you have to hand code one, recursive descent (LL) is something you can do realistically; people cannot hand-build L(AL)R parsers practically by hand.
Given that modern parser generators will handle all the parser construction for you, and that space is not much of an issue, I prefer LR parsers because you don't have to fight with the grammars as much to make them valid for your particular parser generator (no "remove all the left recursion" silliness).
In fact, I prefer GLR parsers, which will pretty much parse anything with a context free grammar. No left-recursion worries. No shift/reduce conflict worries. No lookahead limits.
If you want to see the range of languages that one GLR parsing engine can handle (including the famously hard-to-parse-using-LL/LALR language, C++), you can look here.
从我个人的经验来看(我在各种情况下都使用过这两种方法),最实际的区别是,使用 LL(k),您可以以更简单的方式定义语法(因为它是自上而下的),而无需关心许多可能的归约- LR 解析器经常发生减少或移位减少冲突。您唯一需要关心的是左递归,它必须转换为右递归。
另一件事是,自上而下的方法通常意味着更高的复杂性(关于空间或时间),因为它必须在解析时存储整个树,并且它可以增长很多,直到解决歧义为止。
From my personal experience (I used both for various situations), the most practical difference is that, with a LL(k), you can define the grammar in an easier way (since it's top-down) without caring about many possible reduce-reduce or shift-reduce conflicts which often occur with LR parsers. The only thing you have to care about is left-recursion which must be transformed into right one.
Another thing is that the top-down approach usually implies a higher complexity (regarding either space or time), because it has to store the whole tree while parsing and it can grow a lot until ambiguities are solved.
我所熟悉的唯一优点是您可以轻松地手动编写 LL 解析器代码。 LR 解析器更难手动编码(您通常使用解析器生成器)。
The only advantage I've ever been familiar with is that you can easily code LL parsers by hand. LR parsers are MUCH harder to code by hand (you usually use a parser generator).
LL 解析的最坏情况复杂度为 O(n^4),而 LR 解析的最坏情况复杂度更好,为 O(n^3)。
(但没有人会编写 O(n^4) 语法。)
https://en。 wikipedia.org/wiki/Top-down_parsing
LL Parsings worst case complexity is O(n^4), whereas LR parsing's worst case complexity is better, O(n^3).
(But no one would ever write O(n^4) grammar.)
https://en.wikipedia.org/wiki/Top-down_parsing
根据 Laurence Tratt 的说法,LL 解析器有一个小但重要的利基市场,那就是如果您需要:
并手动编写一个递归下降解析器来实现这一点。
然而:
因此他的结论是 LR 解析能够很好地处理左递归,是正确的选择。
为了更彻底地回顾这一点,我推荐劳伦斯·特拉特(Laurence Tratt)的优秀论文哪种解析方法?
According to Laurence Tratt, LL parsers have a small but important niche, which is if you need:
And hand code a recursive descent parser to accomplish that.
However:
and thus his conclusion is that LR parsing, which handles left recursion just fine, is the way to go.
For a more thorough review of this I recommend Laurence Tratt's excellent essay Which Parsing Approach?
我想到的一个原因是,在 LL 范式中开发一种需要任意回溯的语言(咳 C++)要容易得多。
One reason that comes to mind is, it's much easier to do a language that needs arbitrary backtracking (cough C++) in an LL paradigm.