当前位置：文江博客话题详情

前瞻集的精确定义是什么？

发布于 2024-09-19 11:08:39 字数 96 浏览 4 评论 0原文

我正在尝试编写编译器并学习语法分析背后的理论。我发现，尽管它是理解识别算法的关键概念，但网上有关它的信息却相当匮乏。看来 StackOverflow 处于解决这个问题的独特位置。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

夏末的微笑 2024-09-26 11:08:39

语法的先行集是根据其每个非终结符的先行集来定义的，而这些非终结符又依赖于每个产生式的先行集。确定先行集可以帮助我们确定语法是否为 LL(1)，以及是否为，我们需要哪些信息来为其构建递归下降解析器。

定义： LOOKAHEAD(X -> α) 和 LOOKAHEAD(X)：

LOOKAHEAD(X -> α) = FIRST(α) U FOLLOW(X), if NULLABLE(α)
LOOKAHEAD(X -> α) = FIRST(α), if not NULLABLE(α)
LOOKAHEAD(X) = LOOKAHEAD(X -> α) U LOOKAHEAD(X -> β) U LOOKAHEAD(X -> γ)

其中 FIRST(α) 为α 可以开始的终结符集合，FOLLOW(X) 是可以在语法中任何位置的 X 之后的终结符集合，NULLABLE( α)是α是否可以推导出一个空的终端序列（记为ε）。以下定义取自 Torben Mogensen 的免费书籍编译器设计基础。 请参阅下面的示例。

定义： NULLABLE(X)：

NULLABLE(ε) = true
NULLABLE(x) = false, if x is a terminal
NULLABLE(αβ) = NULLABLE(α) and NULLABLE(β)
NULLABLE(P) = NULLABLE(α_1) or NULLABLE(α_2) or ... or NULLABLE(α_n),
               if P is a non-terminal and the right-hand-sides
               of all its productions are α_1, α_2, ..., α_n.

定义： FIRST(X) ：

FIRST(ε) = Ø
FIRST(x) = {x}, assuming x is a terminal
FIRST(αβ) = FIRST(α) U FIRST(β), if NULLABLE(α)
          = FIRST(α), if not NULLABLE(α)
FIRST(P) = FIRST(α_1) U FIRST(α_2) U ... U FIRST(α_n),
               if P is a non-terminal and the right-hand-sides
               of all its productions are α_1, α_2, ..., α_n.

定义：FOLLOW(X)：

终结符 a 位于 FOLLOW(X) 当且仅当存在从文法的起始符号 S 导出的情况，使得 S ⇒ αX aβ，其中 α 和 β 是（可能是空）语法符号序列。

直觉： FOLLOW(X)：

查看语法中 X 出现的位置。所有可以跟随它（直接或通过任何级别的递归）的终端都在FOLLOW(X)中。此外，如果X出现在产生式的末尾（例如A -> foo X），或者后面跟着可以减少到ε的其他内容（例如A -> foo X B 和 B -> ε)，那么无论 A 后面可以跟什么，X 都可以也可以跟随（即 FOLLOW(A) ⊆ FOLLOW(X)）。

请参阅 Torben 书中确定 FOLLOW(X) 的方法以及下面的演示。

一个例子：

E -> n A
A -> E B
A -> ε
B -> + A
B -> * A

首先，NULLABLE和FIRST被确定：

NULLABLE(E) = NULLABLE(n A) = NULLABLE(n) ∧ NULLABLE(A) = false
NULLABLE(A) = NULLABLE(E B) ∨ NULLABLE(ε) = true
NULLABLE(B) = NULLABLE(+ A) ∨ NULLABLE(* A) = false

FIRST(E) = FIRST(n A) = {n}
FIRST(A) = FIRST(E B) U FIRST(ε) = FIRST(E) U Ø = {n} (because E is not NULLABLE)
FIRST(B) = FIRST(+ A) U FIRST(* A) = FIRST(+) U FIRST(*) = {+, *}

在确定FOLLOW之前，产生式E'->添加 E $，其中 $ 被视为“文件结束”非终结符。然后确定FOLLOW：

FOLLOW(E): Let β = $, so add the constraint that FIRST($) = {$} ⊆ FOLLOW(E)
           Let β = B, so add the constraint that FIRST(B) = {+, *} ⊆ FOLLOW(E)
FOLLOW(A): Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(E) ⊆ FOLLOW(A).
           Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(B) ⊆ FOLLOW(A).
           Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(B) ⊆ FOLLOW(A).
FOLLOW(B): Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(B).
           Because NULLABLE(ε), add the constraint that FOLLOW(A) ⊆ FOLLOW(B).

解决这些约束（也可以通过定点迭代来实现），

    {+, *, $} ⊆ FOLLOW(E)
    FOLLOW(E) ⊆ FOLLOW(A)
    FOLLOW(A) = FOLLOW(B)

    FOLLOW(E) = FOLLOW(A) = FOLLOW(B) = {+, *, $}.

现在可以确定每个产生式的LOOKAHEAD：

LOOKAHEAD(E -> n A) = FIRST(n A) = {n}     because ¬NULLABLE(n A)
LOOKAHEAD(A -> E B) = FIRST(E B)           because ¬NULLABLE(E B)
                    = FIRST(E) = {n}       because ¬NULLABLE(E)
LOOKAHEAD(A -> ε)   = FIRST(ε) U FOLLOW(A) because NULLABLE(ε)
                    = Ø U {+, *, $} = {+, *, $}
LOOKAHEAD(B -> + A) = FIRST(+ A)           because ¬NULLABLE(+ A)
                    = FIRST(+) = {+}       because ¬NULLABLE(+)
LOOKAHEAD(B -> * A) = {*}                  for the same reason

最后，LOOKAHEAD< /em> 对于每个非终结符可以确定：

LOOKAHEAD(E) = LOOKAHEAD(E -> n A) = {n}
LOOKAHEAD(A) = LOOKAHEAD(A -> E B) U LOOKAHEAD(A -> ε)   = {n} U {+, *, $}
LOOKAHEAD(B) = LOOKAHEAD(B -> + A) U LOOKAHEAD(B -> * A) = {+, *}

通过这些知识，我们可以确定该文法不是 LL(1)，因为它的非终结符具有重叠的先行集。（即，我们无法创建一个一次读取一个符号并明确决定使用哪个产生式的程序。）

The lookahead sets for a grammar is defined in terms of the lookahead sets for each of its non-terminals, which in turn rely on the lookahead set for each production. Determining lookahead sets can help us determine if a grammar is LL(1), and if it is, what information we need to construct a recursive-descent parser for it.

Definition: LOOKAHEAD(X -> α) and LOOKAHEAD(X):

LOOKAHEAD(X -> α) = FIRST(α) U FOLLOW(X), if NULLABLE(α)
LOOKAHEAD(X -> α) = FIRST(α), if not NULLABLE(α)
LOOKAHEAD(X) = LOOKAHEAD(X -> α) U LOOKAHEAD(X -> β) U LOOKAHEAD(X -> γ)

where FIRST(α) is the set of terminals that α can begin with, FOLLOW(X) is the set of terminals that can come after an X anywhere in the grammar, and NULLABLE(α) is whether α can derive an empty sequence of terminals (denoted ε). The following definitions are taken from Torben Mogensen's free book, Basics of Compiler Design. See below for an example.

Definition: NULLABLE(X):

NULLABLE(ε) = true
NULLABLE(x) = false, if x is a terminal
NULLABLE(αβ) = NULLABLE(α) and NULLABLE(β)
NULLABLE(P) = NULLABLE(α_1) or NULLABLE(α_2) or ... or NULLABLE(α_n),
               if P is a non-terminal and the right-hand-sides
               of all its productions are α_1, α_2, ..., α_n.

Definition: FIRST(X):

FIRST(ε) = Ø
FIRST(x) = {x}, assuming x is a terminal
FIRST(αβ) = FIRST(α) U FIRST(β), if NULLABLE(α)
          = FIRST(α), if not NULLABLE(α)
FIRST(P) = FIRST(α_1) U FIRST(α_2) U ... U FIRST(α_n),
               if P is a non-terminal and the right-hand-sides
               of all its productions are α_1, α_2, ..., α_n.

Definition: FOLLOW(X):

A terminal symbol a is in FOLLOW(X) if and only if there is a derivation from the start symbol S of the grammar such that S ⇒ αX aβ, where α and β are (possibly empty) sequences of grammar symbols.

Intuition: FOLLOW(X):

Look at where X occurs in the grammar. All terminals that could follow it (directly or by any level of recursion) is in FOLLOW(X). Additionally, if X occurs at the end of a production (e.g. A -> foo X), or is followed by something else that can reduce to ε (e.g. A -> foo X B and B -> ε), then whatever A can be followed by, X can also be followed by (i.e. FOLLOW(A) ⊆ FOLLOW(X)).

See the method for determining FOLLOW(X) in Torben's book and a demonstration of it just below.

An example:

E -> n A
A -> E B
A -> ε
B -> + A
B -> * A

First, NULLABLE and FIRST and are determined:

NULLABLE(E) = NULLABLE(n A) = NULLABLE(n) ∧ NULLABLE(A) = false
NULLABLE(A) = NULLABLE(E B) ∨ NULLABLE(ε) = true
NULLABLE(B) = NULLABLE(+ A) ∨ NULLABLE(* A) = false

FIRST(E) = FIRST(n A) = {n}
FIRST(A) = FIRST(E B) U FIRST(ε) = FIRST(E) U Ø = {n} (because E is not NULLABLE)
FIRST(B) = FIRST(+ A) U FIRST(* A) = FIRST(+) U FIRST(*) = {+, *}

Before FOLLOW is determined, the production E' -> E $ is added, where $ is considered the "end-of-file" non-terminal. Then FOLLOW is determined:

FOLLOW(E): Let β = $, so add the constraint that FIRST($) = {$} ⊆ FOLLOW(E)
           Let β = B, so add the constraint that FIRST(B) = {+, *} ⊆ FOLLOW(E)
FOLLOW(A): Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(E) ⊆ FOLLOW(A).
           Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(B) ⊆ FOLLOW(A).
           Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(A).
           Because NULLABLE(ε), add the constraint that FOLLOW(B) ⊆ FOLLOW(A).
FOLLOW(B): Let β = ε, so add the constraint that FIRST(ε) = Ø ⊆ FOLLOW(B).
           Because NULLABLE(ε), add the constraint that FOLLOW(A) ⊆ FOLLOW(B).

Resolving these constraints (could also be achieved by fixed-point iteration),

    {+, *, $} ⊆ FOLLOW(E)
    FOLLOW(E) ⊆ FOLLOW(A)
    FOLLOW(A) = FOLLOW(B)

    FOLLOW(E) = FOLLOW(A) = FOLLOW(B) = {+, *, $}.

Now LOOKAHEAD for each production can be determined:

LOOKAHEAD(E -> n A) = FIRST(n A) = {n}     because ¬NULLABLE(n A)
LOOKAHEAD(A -> E B) = FIRST(E B)           because ¬NULLABLE(E B)
                    = FIRST(E) = {n}       because ¬NULLABLE(E)
LOOKAHEAD(A -> ε)   = FIRST(ε) U FOLLOW(A) because NULLABLE(ε)
                    = Ø U {+, *, $} = {+, *, $}
LOOKAHEAD(B -> + A) = FIRST(+ A)           because ¬NULLABLE(+ A)
                    = FIRST(+) = {+}       because ¬NULLABLE(+)
LOOKAHEAD(B -> * A) = {*}                  for the same reason

Finally, LOOKAHEAD for each non-terminal can be determined:

LOOKAHEAD(E) = LOOKAHEAD(E -> n A) = {n}
LOOKAHEAD(A) = LOOKAHEAD(A -> E B) U LOOKAHEAD(A -> ε)   = {n} U {+, *, $}
LOOKAHEAD(B) = LOOKAHEAD(B -> + A) U LOOKAHEAD(B -> * A) = {+, *}

By this knowledge we can determine that this grammar is not LL(1) because its non-terminals have overlapping lookahead sets. (I.e., we cannot create a program that reads one symbol at a time and decides unambiguously which production to use.)

回复收藏 0 原文

~没有更多了~