如何将以下歧义语法转换为明确语法？

发布于 2024-12-05 08:55:00 字数 274 浏览 5 评论 0原文

我理解两者之间的区别，歧义意味着至少有一个字符串具有两个不同的解析树，而一棵明确的树中只有一个。但我似乎无法将其中之一转换为另一种。

我如何将以下不明确的语法转换为明确的语法？

S -> aSb
S -> abS
S -> lambda

编辑：好的，我对此的尝试会是一些

S -> aSb | lambda
b -> abS | lambda

想法吗？

原文

I understand how the difference between the two, how ambiguity means that there is at least one string with 2 distinct parse trees while there is only one in an unambiguous tree. But I can't seem to convert one to the other.

How would I convert the following ambiguous grammar to an unambiguous one?

S -> aSb
S -> abS
S -> lambda

Edit: Ok, my stab at this would be something like

S -> aSb | lambda
b -> abS | lambda

any thoughts?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小兔几 2024-12-12 08:55:00

该语法是不明确的，不仅因为有两个规则将“a”作为下一个标记进行匹配，还因为“ab”可以由第一条或第二条规则匹配（在每个规则中使用第三条规则替换 S）。

存在一种本质上不明确的语法，但这不是一种。

针对这个具体示例，我首先枚举将解析的字符串。我对规则 1,2 和 3 进行了编号 - 并考虑了规则 1 和 2 可能出现在解析中的所有序列（这是生成终端的两个规则）。注意，我假设“lambda”表示空产生式。

1,2 => ab
11,12 => abab
21,22 => aabb
111,112 => ababab
121,122 => abaabb
211,212 => aababb
221,222 => aaabbb
1111,1112 => abababab
1121,1122 => ababaabb
1211,1212 => abaababb
1221,1222 => abaaabbb
2111,2112 => aabababb
2121,2122 => aabaabbb
2211,2212 => aaababbb
2221,2222 => aaaabbbb

从这个练习中，很明显，我们正在匹配“a”和“b”的偶数长度字符串，其中“a”端子的数量与“b”端子的数量完全匹配......此外，匹配的两个字符串的串联仅当使用第二条规则匹配前缀时才会产生另一个匹配字符串。

根据这个分析，我创作了一些新作品。

S -> a a X
S -> a b S
S -> lambda
X -> S b b

这个新语法不是二义性的，但它匹配与二义性语法相同的字符串。它通过引入新的非终结符 X 来实现这一点。当该 CFG 与下推自动机一起使用时，使用 S 和 X 产生的附加状态信息足以避免歧义。

如果这个问题是在使用 Yacc 或 Bison 之类的环境中出现的，那么这种歧义通常表明您对终端令牌的选择很糟糕。如果您选择“aa”、“ab”和“bb”作为终端 - 您就不会遇到困难。当使用 (F)lex 作为标记生成器时，根据经验，最好使其匹配的标记尽可能大......因为匹配正则表达式（至少在理论上）比匹配正则表达式更快上下文无关语法 - 这可能理所当然地产生了两字符标记方法。

The grammar is ambiguous not only because there are two rules that match 'a' as the next token - but because 'ab' can be matched either by the first or second rule (substituting using the third for S in each).

There is such a thing as an inherently ambiguous grammar, but this isn't one.

Focusing on this specific example, I started by enumerating the strings that would parse. I numbered the rules 1,2 and 3 - and considered all the sequences in which rules 1 and 2 could appear in the parse (these being the two rules that generate terminals.) N.B. I assumed "lambda" denoted the empty production.

1,2 => ab
11,12 => abab
21,22 => aabb
111,112 => ababab
121,122 => abaabb
211,212 => aababb
221,222 => aaabbb
1111,1112 => abababab
1121,1122 => ababaabb
1211,1212 => abaababb
1221,1222 => abaaabbb
2111,2112 => aabababb
2121,2122 => aabaabbb
2211,2212 => aaababbb
2221,2222 => aaaabbbb

From this exercise, it is obvious that we're matching even length strings of 'a and b' where the number of 'a' terminals exactly matches the number of 'b' terminals... Further, the concatenation of two strings that match only results in another matching string if the prefix matched using the second rule.

From this analysis, I drew up some new productions.

S -> a a X
S -> a b S
S -> lambda
X -> S b b

This new grammar is not ambiguous, but it matches the same strings as the ambiguous grammar. It achieves this by introducing a new non-terminal X. When this CFG is used with a push-down automata, the additional state-information arising from using both S and X is sufficient to avoid ambiguity.

If this problem arose in the context of using something like Yacc or Bison, the ambiguity is often an indication that you've made a poor choice of terminal tokens. If you'd picked 'aa', 'ab' and 'bb' as terminals - you'd not have run into difficulty. When using (F)lex as a tokenizer, as a rule of thumb, it's a good idea to make the tokens it matches as big as makes sense... as it's quicker to match a regular expression (in theory at least) than a context free grammar - and this might have yielded the two-character token approach as a matter of course.

回复收藏 0 原文

~没有更多了~