前瞻集的精确定义是什么?
我正在尝试编写编译器并学习语法分析背后的理论。我发现,尽管它是理解识别算法的关键概念,但网上有关它的信息却相当匮乏。看来 StackOverflow 处于解决这个问题的独特位置。
I'm toying around with writing compilers and learning about the theory behind syntax analysis. I've found that even though it's a key concept for understanding recognition algorithms, information about it on the net is fairly poor. It seems StackOverflow is in a unique position to fix this problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
语法的先行集是根据其每个非终结符的先行集来定义的,而这些非终结符又依赖于每个产生式的先行集。确定先行集可以帮助我们确定语法是否为 LL(1),以及是否为,我们需要哪些信息来为其构建递归下降解析器。
定义: LOOKAHEAD(X -> α) 和 LOOKAHEAD(X):
其中 FIRST(α) 为α 可以开始的终结符集合,FOLLOW(X) 是可以在语法中任何位置的 X 之后的终结符集合,NULLABLE( α)是α是否可以推导出一个空的终端序列(记为ε)。以下定义取自 Torben Mogensen 的免费书籍编译器设计基础。 请参阅下面的示例。
定义: NULLABLE(X):
定义: FIRST(X) :
定义:FOLLOW(X):
直觉: FOLLOW(X):
请参阅 Torben 书中确定 FOLLOW(X) 的方法以及下面的演示。
一个例子:
首先,NULLABLE和FIRST被确定:
在确定FOLLOW之前,产生式
E'->添加 E $
,其中$
被视为“文件结束”非终结符。然后确定FOLLOW:解决这些约束(也可以通过定点迭代来实现),
现在可以确定每个产生式的LOOKAHEAD:
最后,LOOKAHEAD< /em> 对于每个非终结符可以确定:
通过这些知识,我们可以确定该文法不是 LL(1),因为它的非终结符具有重叠的先行集。 (即,我们无法创建一个一次读取一个符号并明确决定使用哪个产生式的程序。)
The lookahead sets for a grammar is defined in terms of the lookahead sets for each of its non-terminals, which in turn rely on the lookahead set for each production. Determining lookahead sets can help us determine if a grammar is LL(1), and if it is, what information we need to construct a recursive-descent parser for it.
Definition: LOOKAHEAD(X -> α) and LOOKAHEAD(X):
where FIRST(α) is the set of terminals that α can begin with, FOLLOW(X) is the set of terminals that can come after an X anywhere in the grammar, and NULLABLE(α) is whether α can derive an empty sequence of terminals (denoted ε). The following definitions are taken from Torben Mogensen's free book, Basics of Compiler Design. See below for an example.
Definition: NULLABLE(X):
Definition: FIRST(X):
Definition: FOLLOW(X):
Intuition: FOLLOW(X):
See the method for determining FOLLOW(X) in Torben's book and a demonstration of it just below.
An example:
First, NULLABLE and FIRST and are determined:
Before FOLLOW is determined, the production
E' -> E $
is added, where$
is considered the "end-of-file" non-terminal. Then FOLLOW is determined:Resolving these constraints (could also be achieved by fixed-point iteration),
Now LOOKAHEAD for each production can be determined:
Finally, LOOKAHEAD for each non-terminal can be determined:
By this knowledge we can determine that this grammar is not LL(1) because its non-terminals have overlapping lookahead sets. (I.e., we cannot create a program that reads one symbol at a time and decides unambiguously which production to use.)