Bison Shift/Reduce 简单语法冲突

发布于 2024-09-12 08:06:34 字数 697 浏览 11 评论 0原文

我正在为我设计的语言构建一个解析器,其中类型名称以大写字母开头,变量名称以小写字母开头,以便词法分析器可以区分并提供不同的标记。此外,字符串“this”被词法分析器(它是一种 OOP 语言)识别并作为单独的标记传递。最后,数据成员只能在“this”对象上访问,因此我构建了如下语法:

%token TYPENAME
%token VARNAME
%token THIS

%%

start:
    Expression
    ;

Expression:
    THIS
    | THIS '.' VARNAME
    | Expression '.' TYPENAME
    ;
%%

表达式的第一条规则允许用户将“this”作为值传递(例如,从方法或方法返回它)传递给方法调用)。第二个用于访问“this”上的数据。第三条规则是用于调用方法,但是我删除了括号和参数,因为它们与问题无关。最初的语法显然比这个大得多,但是这是生成相同错误的最小部分(1 Shift/Reduce 冲突) - 我将它隔离到它自己的解析器文件中并验证了这一点,因此该错误与任何错误无关其他符号。

据我所知,这里给出的语法是明确的,因此不应产生任何错误。如果删除这三个规则中的任何一个或将第二个规则更改为

Expression '.' VARNAME

不存在冲突。无论如何,我可能需要有人清楚地说明为什么会发生这种冲突以及如何解决它。

I'm building a parser for a language I've designed, in which type names start with an upper case letter and variable names start with a lower case letter, such that the lexer can tell the difference and provide different tokens. Also, the string 'this' is recognised by the lexer (it's an OOP language) and passed as a separate token. Finally, data members can only be accessed on the 'this' object, so I built the grammar as so:

%token TYPENAME
%token VARNAME
%token THIS

%%

start:
    Expression
    ;

Expression:
    THIS
    | THIS '.' VARNAME
    | Expression '.' TYPENAME
    ;
%%

The first rule of Expression allows the user to pass 'this' around as a value (for example, returning it from a method or passing to a method call). The second is for accessing data on 'this'. The third rule is for calling methods, however I've removed the brackets and parameters since they are irrelevant to the problem. The originally grammar was clearly much larger than this, however this is the smallest part that generates the same error (1 Shift/Reduce conflict) - I isolated it into its own parser file and verified this, so the error has nothing to do with any other symbols.

As far as I can see, the grammar given here is unambiguous and so should not produce any errors. If you remove any of the three rules or change the second rule to

Expression '.' VARNAME

there is no conflict. In any case, I probably need someone to state the obvious of why this conflict occurs and how to resolve it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

她如夕阳 2024-09-19 08:06:34

问题是语法只能向前看。因此,当您看到 THIS 然后是 . 时,您是在第 2 行(Expression: THIS '.' VARNAME)还是第 3 行(< code>Expression: 表达式 '.' TYPENAME,根据第 1 行进行归约。

语法可以将 THIS. 简化为 Expression.,然后查找 TYPENAME 或将其转换为 THIS. 并寻找 VARNAME,但它必须决定何时到达 .

The problem is that the grammar can only look one ahead. Therefore when you see a THIS then a ., are you in line 2(Expression: THIS '.' VARNAME) or line 3 (Expression: Expression '.' TYPENAME, via a reduction according to line 1).

The grammar could reduce THIS. to Expression. and then look for a TYPENAME or shift it to THIS. and look for a VARNAME, but it has to decide when it gets to the ..

南笙 2024-09-19 08:06:34

我尝试避免 y.output 但有时它确实有帮助。我查看了它生成的文件并看到了。

state 1

    2 Expression: THIS.  [$end, '.']
    3           | THIS . '.' VARNAME

    '.'  shift, and go to state 4

    '.'       [reduce using rule 2 (Expression)]
    $default  reduce using rule 2 (Expression)

基本上它是说它看到了“。”并且可以减少或可以转移。有时,Reduce 会让我感到愤怒,因为它们很难被精细化。转变是规则 3 并且是显而易见的(但输出没有提及规则 #)。它看到的减少是“.”在本例中是行

| Expression '.' TYPENAME

When it go to Expression 它查看下一个字母(“.”)并进入。现在它看到 THIS | 所以当它到达该语句的末尾时期望“.”当它离开或出现错误时。然而它看到了这个“。”而它在 this 和 '.' 之间(因此输出文件中的点)并且它可以减少规则,因此存在路径冲突。我相信您可以使用 %glr-parser 来允许它尝试两者,但冲突越多,您就越有可能获得意外的输出或歧义错误。我过去曾犯过歧义错误。处理它们很烦人,特别是如果您不记得是什么规则导致或影响了它们。建议避免冲突。

在尝试使用 bison 之前,我强烈推荐这本书

我想不出一个“伟大”的解决方案,但这不会产生冲突

start:
    ExpressionLoop
    ;

ExpressionLoop:
      Expression
    | ExpressionLoop ';' Expression
    ;
Expression:
      rval 
    | rval '.' TYPENAME
    | THIS //trick is moving this AWAY so it doesnt reduce
rval:

    THIS '.' VARNAME

替代方案您可以通过在规则中添加更多内容来使其稍后减少,这样它就不会减少,或者通过在之后或之前添加标记来明确要采取的路径或失败(记住,它必须在减少任何路径之前知道)

start:
    ExpressionLoop
    ;

ExpressionLoop:
      Expression
    | ExpressionLoop ';' Expression
    ;
Expression:
      rval 
    | rval '.' TYPENAME
rval:
      THIS '@'
    | THIS '.' VARNAME
%%

-编辑-注意如果我想做 func paramtype varname 我不能,因为根据词法分析器 func 的类型是Var(A-Za-z09_)以及类型。 param 和 varname 也是 var,所以这会导致减少/减少冲突。你不能把它们写成它们本来的样子,而只能写它们的样子。所以写作时请记住这一点。您必须编写一个标记来区分两者,或者将其编写为两者之一,但在代码中编写附加逻辑(规则右侧的 { } 中的部分)以检查它是否是 funcname 或类型并处理这两种情况。

I try to avoid y.output but sometimes it does help. I looked at the file it produced and saw.

state 1

    2 Expression: THIS.  [$end, '.']
    3           | THIS . '.' VARNAME

    '.'  shift, and go to state 4

    '.'       [reduce using rule 2 (Expression)]
    $default  reduce using rule 2 (Expression)

Basically it is saying it sees '.' and can reduce or it can shift. Reduce makes me anrgu sometimes because they are hard to fine. The shift is rule 3 and is obvious (but the output doesnt mention the rule #). The reduce where it see's '.' in this case is the line

| Expression '.' TYPENAME

When it goes to Expression it looks at the next letter (the '.') and goes in. Now it sees THIS | so when it gets to the end of that statement it expects '.' when it leaves or an error. However it sees THIS '.' while its between this and '.' (hence the dot in the out file) and it CAN reduce a rule so there is a path conflict. I believe you can use %glr-parser to allow it to try both but the more conflicts you have the more likely you'll either get unexpected output or an ambiguity error. I had ambiguity errors in the past. They are annoying to deal with especially if you dont remember what rule caused or affected them. it is recommended to avoid conflicts.

I highly recommend this book before attempting to use bison.

I cant think of a 'great' solution but this gives no conflicts

start:
    ExpressionLoop
    ;

ExpressionLoop:
      Expression
    | ExpressionLoop ';' Expression
    ;
Expression:
      rval 
    | rval '.' TYPENAME
    | THIS //trick is moving this AWAY so it doesnt reduce
rval:

    THIS '.' VARNAME

Alternative you can make it reduce later by adding more to the rule so it doesnt reduce as soon or by adding a token after or before to make it clear which path to take or fails (remember, it must know BEFORE reducing ANY path)

start:
    ExpressionLoop
    ;

ExpressionLoop:
      Expression
    | ExpressionLoop ';' Expression
    ;
Expression:
      rval 
    | rval '.' TYPENAME
rval:
      THIS '@'
    | THIS '.' VARNAME
%%

-edit- note if i want to do func param and type varname i cant because type according to the lexer func is a Var (which is A-Za-z09_) as well as type. param and varname are both var's as well so this will cause me a reduce/reduce conflict. You cant write this as what they are, only what they look like. So keep that in mind when writing. You'll have to write a token to differentiate the two or write it as one of the two but write additional logic in code (the part that is in { } on the right side of the rules) to check if it is a funcname or a type and handle both those case.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文