LL 解析器比 LR 解析器有什么优势？

发布于 2024-09-30 21:02:24 字数 495 浏览 11 评论 0原文

LL 解析器相对于 LR 解析器有哪些优势，以保证它们在当今的解析器生成器工具中相对受欢迎？

根据维基百科，LR 解析似乎比 LL 具有优势：

LR解析比LL解析可以处理更大范围的语言，并且在错误报告方面也更好，即当输入不符合语法时它会尽快检测到语法错误。这与 LL(k)（或更糟糕的是 LL(*) 解析器）形成对比，LL(k) 可能会由于回溯而将错误检测推迟到语法的不同分支，这通常会使错误更难在具有长公共前缀的析取中定位.

注意：这不是家庭作业。当我发现 Antlr 是一个 LL 解析器生成器时，我感到很惊讶（尽管它的名字中有“LR”！）。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

江城子 2024-10-07 21:02:24

如果你想要一个解析树/森林并且不介意黑匣子，GLR 就很棒。它可以让你输入任何你想要的 CFG通过详尽的测试在解析时检查歧义的成本，而不是静态解决 LR/LALR 冲突。有人说这是一个很好的权衡。 Ira Baxter 的 DMS 工具或 Elkhound 具有免费的 C++ 语法，对于解决此类问题非常有用。 ANTLR 对于一大类语言应用程序也很有用，但使用自上而下的方法，生成递归下降解析器称为允许语义谓词的 LL(*)。我将在这里无需证明地声明谓词允许您解析 CFG 之外的上下文相关语言。程序员喜欢在语法中插入操作，喜欢良好的错误处理，喜欢单步调试。 LL在这三方面都擅长。 LL 是我们手工完成的，因此更容易理解。不要相信维基百科关于 LR 更擅长处理错误的废话。也就是说，如果您使用 ANTLR 进行大量回溯，则 LL(*) 的错误确实会更严重（PEG 有这个问题）。

重新回溯。 GLR 也进行推测（即回溯），就像 PEG、ANTLR 和任何其他非确定性策略一样。在任何非确定性 LR 状态下，GLR“分叉”子解析器来尝试任何可行的路径。不管怎样，LL 有很好的错误处理上下文。 LR 知道它与表达式匹配，LL 知道它是赋值或 IF 条件中的表达式； LR 知道它可能属于其中任何一个，但不确定 - 而这种不确定性正是它发挥作用的地方。

GLR 是 O(n^3) 最坏情况。 packrat/PEG 是 O(n) 最坏情况。由于循环前瞻 DFA，ANTLR 的复杂度为 O(n^2)，但实际上却为 O(n)。真的没关系。 GLR 足够快。

ANTLR是AN其他T工具，用于L和R识别，而不是反LR ，但我也喜欢那个；）

坦白说，像很多 80 年代的年轻程序员一样，我不理解 LALR，也不喜欢黑匣子（现在我挖掘 GLR 引擎的美妙之处，但仍然更喜欢 LL）。我构建了一个基于 LL(k) 的商业编译器，并决定构建一个工具来生成我手动构建的内容。 ANTLR 并不适合所有人，像 C++ 这样的边缘情况可能用 GLR 可以更好地处理，但很多人发现 ANTLR 适合他们的舒适区。自 2008 年 1 月以来，ANTLRWorks 中 ANTLR 的二进制 jar 和源 zip 的下载总数已达 134,000 次（根据 Google Analytics）。请参阅我们关于 LL(*) 的论文，其中包含大量经验数据。

GLR is great if you want a parse tree/forest and don't mind black boxes. It lets you type in whatever CFG you want at the cost of checking for ambiguities at parse time via exhaustive testing, instead of resolving LR/LALR conflicts statically. Some say that's a good trade-off. Ira Baxter's DMS tool or Elkhound, which has a free C++ grammar, are useful for this class of problem. ANTLR is useful for a large class of language applications too, but uses a top-down approach, generating recursive descent parsers called LL(*) that allow semantic predicates. I will state without proof here that predicates allow you to parse context-sensitive languages beyond CFGs. Programmers like to insert actions into grammars, like good error handling, and like to single-step debug. LL is good at all three. LL is what we do by hand so it's easier to understand. Don't believe the wikipedia nonsense about LR being better at handling errors. That said, if you backtrack a lot with ANTLR, errors are indeed worse with LL(*) (PEGs have this problem).

Re backtracking. GLR speculates (i.e. backtracks) too, just like PEGs, ANTLR, and any other non-deterministic strategy. At any non-deterministic LR state, GLR "forks" sub-parsers to try out any viable path. Anyway, LL has good context for error handling. Where LR knows it's matching an expression, LL knows it's an expression in an assignment or IF-conditional; LR knows it could be in either but isn't sure - and that uncertainty is where it gets its power.

GLR is O(n^3) worst case. packrat/PEG is O(n) worst case. ANTLR's are O(n^2) due to cyclic lookahead DFA but O(n) in practice. Doesn't matter really. GLR is fast enough.

ANTLR is ANother Tool for Lang Recognition not anti-LR, but I like that one too ;)

Frankly, like a lot of young coders in 80s, I didn't understand LALR and didn't like black boxes (now I dig the beauty of the GLR engine but still prefer LL). I built a commercial LL(k) based compiler and decided to build a tool to generate what I had built by hand. ANTLR isn't for everyone and edge cases like C++ might be better handled with GLR but a lot of people find ANTLR fits into their comfort zone. Since Jan 2008, there have been 134,000 downloads of ANTLR's binary jar, within ANTLRWorks, and source zips total (according to Google Analytics). See our paper on LL(*) with lots of empirical data.

回复收藏 0 原文