用Java编写递归下降解析来解析epsilon(ε)

发布于 2024-09-12 19:25:44 字数 386 浏览 7 评论 0原文

例如,

EBNF

A ::= B c;

B ::= T1 | T2 | ε

T1 ::= a

T2 ::= b

parseA()
{
switch(currentToken.kind){
case Token.a :
    parseT1();
case Token.b :
    parseT2();
break;

case <epsilon> :
    
break;
default:
    // report error
break;
}
}

如何用Java编写解析器来解析epsilon(一组空字符串)?

For example,

EBNF

A ::= B c;

B ::= T1 | T2 | ε

T1 ::= a

T2 ::= b

parseA()
{
switch(currentToken.kind){
case Token.a :
    parseT1();
case Token.b :
    parseT2();
break;

case <epsilon> :
    
break;
default:
    // report error
break;
}
}

How to write the parser parse the epsilon ( set of empty string ) in Java?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

陈甜 2024-09-19 19:25:45

无论是 Java 还是任何语言,关键是你有一个令牌流。您不会“获得”令牌并决定要做什么。相反,您“查看”下一个令牌,如果可以使用它,您就“接受”它。

看到区别了吗?

不要“得到”然后“决定”。
“看看并决定”,然后“接受”。
这是“前瞻”概念的核心。

因此,ParseA 调用 ParseB。

解析 B 查看流中的下一个标记。
如果它是“a”,它接受它并返回。
如果它是“b”,它接受它并返回。
否则它只是返回到 ParseA (不接受任何内容)。

然后 ParseA 查看下一个标记。
如果它是“c”,它接受它并返回。
否则就会失败。

有道理吗?

为此,您应该将哨兵令牌添加到流的末尾,该令牌永远不会被接受。最后,这应该是流中剩下的唯一令牌。如果不是,则会出现语法错误,其中包含末尾的额外垃圾。

Whether it's Java or any language, the point is you have a stream of tokens. You don't "get" a token and decide what to do. Rather you "look at" the next token, and if you can use it, you "accept" it.

See the difference?

Do not "get" and then "decide".
Do "look and decide" and then "accept".
This is the heart of the concept of "lookahead".

So, ParseA calls ParseB.

Parse B looks at the next token in the stream.
If it is "a" it accepts it and returns.
If it is "b" it accepts it and returns.
Otherwise it simply returns to ParseA (without accepting anything).

Then ParseA looks at the next token.
If it is "c" it accepts it and returns.
Otherwise it fails.

Make sense?

To do this, you should add a sentinel token to the end of the stream, that is never accepted. At the end, that should be the only token left in the stream. If not, you've got a syntax error consisting of extra garbage at the end.

寂寞花火° 2024-09-19 19:25:45

epsilon 只是“此处允许空字符串”的标记,因此您不需要解析任何内容;下降完成。不过,你似乎不太可能真正拥有一个代币;您需要检测是否没有可用的令牌,或者下一个令牌是否可以在另一个产品中使用。

例如,您可能会得到输入c。您需要意识到这与产生式 A ::= B c; 匹配,因为 B 可以简化为 epsilon —— 没有epsilon 标记,您只需要意识到 B 规则是可选的,在这种情况下需要跳过以将 c 简化为 A

epsilon is just a marker for "empty string allowed here", so you don't need to parse anything; the descent is finished. It seems unlikely that you would actually have a token for that though; you need to detect if there's no token available or if the next token can be used in another production.

For example, you might get the input c. You need to realize that this matches the production A ::= B c;, because B can be reduced to epsilon -- there is no epsilon token, you just need to realize that the B rule there is optional and in that case needs to be skipped over to reduce c to A

兮子 2024-09-19 19:25:45

就目前情况而言,这对于解析器来说有点过于简单化。整个事情可以写成一个简单的正则表达式:“[ab]?c”。如果您确实坚持将其编写为解析器,则代码可能类似于:

Boolean parseA() { 
    // an A is a B followed by a `c`:
    return parseB() && (get_token() == Token.c);
}

Boolean parseB  {
    // Get a token. 
    //     If it's an `a` or a `b`, consume it. 
    //     Otherwise, throw it back (match null string by consuming nothing).
    // Either way, return true (successful match).
    //
    token current_token = get_token();
    if (token != Token.a && token != Token.b)
        push_back(current_token);
    return true;
}

编辑(响应对另一个答案的评论):不,当您匹配 B 时,您不应该寻找一个令牌。就 B 而言,存在三种可能性:匹配“a”、匹配“b”或根本不匹配。然后由解析 A 的部分来检查它是否有 B 后跟 Token.c。

例如,如果您要将语法更改为:

A ::= BC

B ::= a |乙| ε

C ::= c | d

由于“B”仍然具有相同的定义,您不应仅仅因为某些其他定义发生更改而必须更改它。同样,您可以添加到语法中以允许(例如)任意字符串 B 后跟 C。

As it stands, this is a bit on the simplistic side to make much sense as a parser. The whole thing can be written as a simple regular expression: "[ab]?c". If you really insist on writing it as a parser, the code could be something like:

Boolean parseA() { 
    // an A is a B followed by a `c`:
    return parseB() && (get_token() == Token.c);
}

Boolean parseB  {
    // Get a token. 
    //     If it's an `a` or a `b`, consume it. 
    //     Otherwise, throw it back (match null string by consuming nothing).
    // Either way, return true (successful match).
    //
    token current_token = get_token();
    if (token != Token.a && token != Token.b)
        push_back(current_token);
    return true;
}

Edit (in response to comment on another answer): No, when you're matching B, you should not look for a Token.c. As far as B cares, there are three possibilities: match 'a', match 'b', or match nothing at all. It's then up to the part that parses A to check that it has a B followed by a Token.c.

For example, if you were to change the grammar to something like:

A ::= B C

B ::= a | b | ε

C ::= c | d

Since 'B' still has the same definition, you should not have to change it just because some other definition changed. Likewise, you might add to the grammar to allow (for example) an arbitrary string of B's followed by a C.

朱染 2024-09-19 19:25:45

当您看到当前非终端的 FOLLOW 集中的任何内容时,您就“解析”了 epsilon。因此,由于 FOLLOW(B) = { 'c' },您可以在 parseB 的 switch 中使用 case Token.c:

You 'parse' an epsilon when you see anything in the FOLLOW set of the current non-terminal. So since FOLLOW(B) = { 'c' }, you use case Token.c: in the switch in parseB for it

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文