两种语法中非终结符的第一个和后续

发布于 2024-08-22 00:49:33 字数 237 浏览 1 评论 0原文

给定以下语法:

S -> L=L  
s -> L  
L -> *L  
L -> id  

非终结符的第一个和后续是什么?

如果语法改为:

S -> L=R  
S -> R  
L -> *R  
L -> id  
R -> L  

第一个和后面的是什么?

Given the following grammar:

S -> L=L  
s -> L  
L -> *L  
L -> id  

What are the first and follow for the non-terminals?

If the grammar is changed into:

S -> L=R  
S -> R  
L -> *R  
L -> id  
R -> L  

What will be the first and follow ?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

永言不败 2024-08-29 00:49:33

当我在大学学习编译器课程时,我根本不理解 FIRST 和 FOLLOWS。我实现了《龙》书中描述的算法,但我不知道发生了什么。我想我现在知道了。

我假设你有一些书给出了这两个集合的正式定义,而这本书是完全难以理解的。我将尝试对它们进行非正式的描述,希望这能帮助您理解书中的内容。

第一组是您可能会看到的一组终端,作为非终端扩展的第一部分。 FOLLOWS 集合是在非终结符扩展之后您可能看到的终结符集合。

在您的第一个语法中,只有三种终端:=*id。 (您也可以将输入结束符号 $ 视为终结符。)唯一的非终结符是 S (语句)和 L(左值——可以分配给的“东西”)。

将 FIRST(S) 视为可能启动语句的一组非终结符。直观上,您知道您不以 = 开始语句。所以你不会想到它会出现在 FIRST(S) 中。

那么声明如何开始呢?有两个产生式规则定义了 S 的外观,它们都以 L 开头。因此,要弄清楚 FIRST(S) 中的内容,您确实必须查看 FIRST(L) 中的内容。有两个产生式规则定义左值的样子:它以 * 开头或以 id 开头。所以 FIRST(S) = FIRST(L) = { *, id }。

FOLLOWS(S) 很容易。 S 后面没有任何内容,因为它是起始符号。所以 FOLLOWS(S) 中唯一的就是 $,即输入结束符号。

FOLLOWS(L) 有点棘手。您必须查看出现 L 的每条产生式规则,并查看其后面的内容。在第一条规则中,您会看到 = 可能位于 L 后面。所以 = 位于 FOLLOWS(L) 中。但您还会注意到,在该规则中,产生式规则的末尾还有另一个 L。因此,L 后面可以出现的另一件事是可以在该产生式之后出现的任何东西。我们已经发现,S 产生式中唯一可以遵循的就是输入结束。所以 FOLLOWS(L) = { =, $ }。 (如果您查看其他产生式规则,L 始终出现在它们的末尾,因此您只需从中获取 $。)

看看这个 简单解释,现在忽略所有关于 ϵ 的内容,因为您没有任何包含空字符串的作品。在“第一盘规则”下,规则#1、#3 和#4.1 应该有意义。在“跟随集合的规则”下,规则#1、#2 和#3 应该有意义。

当你的生产规则中有 ϵ 时,事情会变得更加复杂。假设你有这样的东西:

D -> S C T id = V  // Declaration is [Static] [Const] Type id = Value
S -> static | ϵ    // The 'static' keyword is optional
C -> const | ϵ     // The 'const' keyword is optional
T -> int | float   // The Type is mandatory and is either 'int' or 'float'
V -> ...           // The Value gets complicated, not important here.

现在如果你想计算 FIRST(D) 你不能只看 FIRST(S),因为 S 可能是“空”。您直观地知道 FIRST(D) 是 { static, const, int, float }。这种直觉被编入规则 #4.2 中。将此示例中的 SCT 视为“简单解释”规则中的 Y1Y2Y3

如果你想计算FOLLOWS(S),你不能只看FIRST(C),因为它可能是空的,所以你还必须看FIRST(T)。所以 FOLLOWS(S) = { const, int, float }。您可以通过应用“遵循集合的规则”#2 和#4(或多或少)来实现这一点。

我希望这对您有所帮助,并且您可以自己弄清楚第二个语法的“第一”和“后续”。

如果有帮助的话,R 代表一个右值——一个你不能赋值的“东西”,比如常量或文字。左值也可以充当右值(但反之则不然)。

a = 2;  // a is an lvalue, 2 is an rvalue
a = b;  // a is an lvalue, b is an lvalue, but in this context it's an rvalue
2 = a;  // invalid because 2 cannot be an lvalue
2 = 3;  // invalid, same reason.
*4 = b; // Valid!  You would almost never write code like this, but it is
        // grammatically correct: dereferencing an Rvalue gives you an Lvalue.

When I took a compiler course in college I didn't understand FIRST and FOLLOWS at all. I implemented the algorithms described in the Dragon book, but I had no clue what was going on. I think I do now.

I assume you have some book that gives a formal definition of these two sets, and the book is completely incomprehensible. I'll try to give an informal description of them, and hopefully that will help you make sense of what's in your book.

The FIRST set is the set of terminals you could possibly see as the first part of the expansion of a non-terminal. The FOLLOWS set is the set of terminals you could possibly see following the expansion of a non-terminal.

In your first grammar, there are only three kinds of terminals: =, *, and id. (You might also consider $, the end-of-input symbol, to be a terminal.) The only non-terminals are S (a statement) and L (an Lvalue -- a "thing" you can assign to).

Think of FIRST(S) as the set of non-terminals that could possibly start a statement. Intuitively, you know you do not start a statement with =. So you wouldn't expect that to show up in FIRST(S).

So how does a statement start? There are two production rules that define what an S looks like, and they both start with L. So to figure out what's in FIRST(S), you really have to look at what's in FIRST(L). There are two production rules that define what an Lvalue looks like: it either starts with a * or with an id. So FIRST(S) = FIRST(L) = { *, id }.

FOLLOWS(S) is easy. Nothing follows S because it is the start symbol. So the only thing in FOLLOWS(S) is $, the end-of-input symbol.

FOLLOWS(L) is a little trickier. You have to look at every production rule where L appears, and see what comes after it. In the first rule, you see that = may follow L. So = is in FOLLOWS(L). But you also notice in that rule that there is another L at the end of the production rule. So another thing that could follow L is anything that could follow that production. We already figured out that the only thing that can follow the S production is the end-of-input. So FOLLOWS(L) = { =, $ }. (If you look at the other production rules, L always appears at the end of them, so you just get $ from those.)

Take a look at this Easy Explanation, and for now ignore all the stuff about ϵ, because you don't have any productions which contain the empty-string. Under "Rules for First Sets", rules #1, #3, and #4.1 should make sense. Under "Rules for Follows Sets", rules #1, #2, and #3 should make sense.

Things get more complicated when you have ϵ in your production rules. Suppose you have something like this:

D -> S C T id = V  // Declaration is [Static] [Const] Type id = Value
S -> static | ϵ    // The 'static' keyword is optional
C -> const | ϵ     // The 'const' keyword is optional
T -> int | float   // The Type is mandatory and is either 'int' or 'float'
V -> ...           // The Value gets complicated, not important here.

Now if you want to compute FIRST(D) you can't just look at FIRST(S), because S may be "empty". You know intuitively that FIRST(D) is { static, const, int, float }. That intuition is codified in rule #4.2. Think of SCT in this example as Y1Y2Y3 in the "Easy Explanation" rules.

If you want to compute FOLLOWS(S), you can't just look at FIRST(C), because that may be empty, so you also have to look at FIRST(T). So FOLLOWS(S) = { const, int, float }. You get that by applying "Rules for follow sets" #2 and #4 (more or less).

I hope that helps and that you can figure out FIRST and FOLLOWS for the second grammar on your own.

If it helps, R represents an Rvalue -- a "thing" you can't assign to, such as a constant or a literal. An Lvalue can also act as an Rvalue (but not the other way around).

a = 2;  // a is an lvalue, 2 is an rvalue
a = b;  // a is an lvalue, b is an lvalue, but in this context it's an rvalue
2 = a;  // invalid because 2 cannot be an lvalue
2 = 3;  // invalid, same reason.
*4 = b; // Valid!  You would almost never write code like this, but it is
        // grammatically correct: dereferencing an Rvalue gives you an Lvalue.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文