当前位置：文江博客话题详情

是否有 LL(0) 解析器之类的东西？

发布于 2024-10-21 01:06:26 字数 94 浏览 6 评论 0原文

我在某处看到一个问题询问 LL(0) 和 LR(0) 解析器之间的区别。是否有 LL(0) 解析器之类的东西？如果是这样，他们如何在不查看任何标记的情况下进行解析？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

趁微风不噪 2024-10-28 01:06:26

LL(0) 解析器确实会查看标记，但它们不会决定对它们应用哪些产生式。他们只是确定序列是否属于该语言。这意味着每个非终结符必须有一个右侧，并且可能没有递归。

G == ID name lastname
name == STRING
lastname == STRING

# lexer rules
# -----------
ID == [0-9]+
STRING == <unicode>+

请注意，正如@280Z28所提到的，需要一个单独的词法分析器来处理可变长度部分（ID和STRING），否则语法将不是LL(0)。

用于解析具有该语法的输入的产生式序列需要零前瞻。

当解析部分给定输入序列后可以应用多个产生式时，确定性需要先行。

从理论上讲，语法生成一种语言，因此，歧义（有多种方法来导出给定短语）是可以的。在解析中，只有一种方式是语义（意义）方式，这就是我们想要的。

在编程语言的解析中，前瞻是了解接下来要使用哪种语法产生式所需的信息。

在 LL(0) 语言中，没有选择，因此输入序列要么被接受并解析，要么被拒绝。

LL(0) parsers do look at the tokens, but they don't decide which productions to apply upon them. They just determine if the sequence belongs to the language or not. This means that every non-terminal symbol must have a single right-hand side and that there may be no recursion.

G == ID name lastname
name == STRING
lastname == STRING

# lexer rules
# -----------
ID == [0-9]+
STRING == <unicode>+

Note that, as mentioned by @280Z28, a separate lexer is needed to deal with the variable length parts (ID and STRING), or the grammar will not be LL(0).

The sequence of productions to apply to parse an input with that grammar requires zero lookahead.

A lookahead is required for determinism when there's more than one production that could be applied after part of the given input sequence has been parsed.

In theory, a grammar generates a language, and, in that, ambiguity (having more than one way to derive a given phrase) is fine. In parsing, having one-and-only-one way is in the way of semantics (meaning), and it is what we want.

In parsing of programming languages, a lookahead is the information required to know which grammar production to use next.

In LL(0) languages, there are no choices, so the input sequence is either accepted and parsed, or rejected.

回复收藏 0 原文

狼性发作 2024-10-28 01:06:26

LR(k) 中的 k 指的是先行标记的数量。您始终至少使用一个令牌来确定要执行的操作。维基百科页面页面提供了有关此内容的更多信息。

直观上，额外的前瞻符号让您可以使用更多信息做出简化选择，因此它们允许表达更大的语法类而不会发生冲突。

回复收藏 0 原文

还如梦归 2024-10-28 01:06:26

当我学习编译器时，我们从未谈论过它们，尽管我们确实谈论了 LL(1)。维基百科上没有提及它们。

LL(0) 解析器意味着解析器可以在不知道流中的下一个标记的情况下做出决定。我预计，如果存在具有这种特性的语言，那么它们也是非常罕见的。

回复收藏 0 原文

病女 2024-10-28 01:06:26

您通常不会听说 LL(0) 解析，因为其他答案中给出的原因：非平凡的解析需要查看一些输入。但是，LL(1) 解析器的某些部分确实可以作为 LL(0) 解析器运行。

例如，下面是一个简单的 BNF 语法，只需要在一个产生式中进行前瞻：

S -> A
A -> B
B -> 'a' | 'b'

B 产生式有两个右侧，对应于输入中的两个单独的字符串“a”和“b”。因此解析器必须查看输入才能选择正确的 RHS。

然而，S 和 A 生产都没有任何选择。因此，虽然它们实际上具有关联的 FIRST 集（包含“a”和“b”），但不需要它们的 FIRST 集来做出解析决策，这意味着 S -> 。 A和A-> B 产品是 LL(0) 子语法。因此，优化是忽略这两个非终结符的 FIRST 集。

为了清楚起见，假设输入是字符串“b”。然后解析器可以自动生成自上而下的推导S -> 。 A a 和A-> B 在查看输入之前（这称为自动机）。然后，对于最内层的推导，对于 B，它必须查看输入来决定使用哪个 B 产生式来完成解析树。

这种优化的优点是可以在检查输入时进行错误检测（没有找到输入，或者找不到“a”或“b”以外的输入），而不是在解析其他产生式时进行。如果需要，错误消息不仅可以引用 B 产生式，还可以引用 A 和 S 产生式，因为它们已经生成了。如果在没有优化的情况下检查 S 的第一组，则当已知的全部是 S → S 时，必须报告错误。 A 正在尝试，这对于用户来说上下文信息要少得多。

You don't usually hear about LL(0) parsing, for the reason given in the other answers: nontrivial parsing requires seeing some input. However, parts of an LL(1) parser can indeed run as an LL(0) parser.

For example, here is a simple BNF grammar that only requires lookahead in one production:

S -> A
A -> B
B -> 'a' | 'b'

The B production has two right-hand-sides, corresponding to two separate strings in the input, 'a' and 'b'. So the parser has to see the input to choose the proper RHS.

However, neither the S nor the A production have any choices to be made. So, while they do in fact have associated FIRST sets (containing both 'a' and 'b'), their FIRST sets are not needed to make a parsing decision, meaning that the S -> A and A -> B products are an LL(0) subgrammar. So an optimization is to ignore FIRST sets for these two nonterminals.

To make this clear, suppose the input is the string 'b'. Then the parser can automatically generate the top-down derivations S -> A a and A -> B before even looking at the input (this is called an automaton). Then, for the innermost derivation, for B, it must look at the input to decide which B production to use to complete the parse tree.

A plus for this kind of optimization is that error detection (finding no input, or an input other than 'a' or 'b') can be done right at the point at which the input is examined, rather than when parsing some other production. If desired, then, the error message can reference not only the B production, but the A and S productions as well, since they have already been generated. If the FIRST set of S were examined without the optimization, then the error would have to be reported when all that is known is that S -> A is being attempted, which is much less context information for the user.

回复收藏 0 原文

~没有更多了~