了解潜在的冲突

发布于 2025-01-12 14:04:23 字数 2961 浏览 1 评论 0原文

我有一个由 Menhir 构建的小型表达式解析器。我试图通过在 parser.mly 中编写恢复语法来在解析期间恢复括号不完整表达式：

%{ 
   open AST
%}

%token<int> LINT 
%token<string> ID
%token LPAREN RPAREN COMMA
%token EOF PLUS STAR EQ

%start<AST.expression> expressionEOF

%right LPAREN RPAREN
%nonassoc EQ
%left PLUS
%left STAR

%%

expressionEOF: e=expression EOF
{
  e
}

expression:
| x=LINT
{
  Int x
}
| x=identifier
{
  Read x
}
| e1=expression b=binop e2=expression
{
  Binop (b, e1, e2)
}
| e1=expression b=binop
(* for "2+", "2*3+" *)
{
  Binop (b, e1, FakeExpression)
}
| LPAREN e=expression RPAREN
{
  Paren e
}
| LPAREN RPAREN
(* for "()" *)
{
  Paren FakeExpression
}
| LPAREN
(* for "(" *)
{
  ParenMissingRparen FakeExpression
}
| LPAREN e=expression 
(* for "(1", "(1+2", "(1+2*3", "((1+2)" *)
{
  ParenMissingRparen e
}
| RPAREN
(* for ")" *)
{
  ExtraRparen FakeExpression
}
| e=expression RPAREN 
(* for "3)", "4))", "2+3)" *)
{
  ExtraRparen e
}

%inline binop:
  PLUS { Add   }
| STAR { Mul   }
| EQ   { Equal }

identifier: x=ID
{
  Id x
}

它在一组不完整表达式上运行良好。但是，menhir --explain parser.mly 返回以下parser.conflict：

** Conflict (reduce/reduce) in state 10.
** Tokens involved: STAR RPAREN PLUS EQ EOF
** The following explanations concentrate on token STAR.
** This state is reached from expressionEOF after reading:

LPAREN expression RPAREN

** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)

expressionEOF 
expression EOF 
expression STAR expression // lookahead token appears
(?)

** In state 10, looking ahead at STAR, reducing production
** expression -> LPAREN expression RPAREN
** is permitted because of the following sub-derivation:

LPAREN expression RPAREN . 

** In state 10, looking ahead at STAR, reducing production
** expression -> expression RPAREN
** is permitted because of the following sub-derivation:

LPAREN expression // lookahead token is inherited
       expression RPAREN . 

** Conflict (reduce/reduce) in state 3.
** Tokens involved: STAR RPAREN PLUS EQ EOF
** The following explanations concentrate on token STAR.
** This state is reached from expressionEOF after reading:

LPAREN RPAREN

** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)

expressionEOF 
expression EOF 
expression STAR expression // lookahead token appears
(?)

** In state 3, looking ahead at STAR, reducing production
** expression -> LPAREN RPAREN
** is permitted because of the following sub-derivation:

LPAREN RPAREN . 

** In state 3, looking ahead at STAR, reducing production
** expression -> RPAREN
** is permitted because of the following sub-derivation:

LPAREN expression // lookahead token is inherited
       RPAREN .

我不明白它试图解释什么。谁能告诉我可能存在哪些潜在冲突（优先举例）以及解决方案是什么？

原文

I have a small parser of expression built by Menhir. I'm trying to recover parenthesis-incomplete expressions during parsing by writing recovery grammars in parser.mly:

%{ 
   open AST
%}

%token<int> LINT 
%token<string> ID
%token LPAREN RPAREN COMMA
%token EOF PLUS STAR EQ

%start<AST.expression> expressionEOF

%right LPAREN RPAREN
%nonassoc EQ
%left PLUS
%left STAR

%%

expressionEOF: e=expression EOF
{
  e
}

expression:
| x=LINT
{
  Int x
}
| x=identifier
{
  Read x
}
| e1=expression b=binop e2=expression
{
  Binop (b, e1, e2)
}
| e1=expression b=binop
(* for "2+", "2*3+" *)
{
  Binop (b, e1, FakeExpression)
}
| LPAREN e=expression RPAREN
{
  Paren e
}
| LPAREN RPAREN
(* for "()" *)
{
  Paren FakeExpression
}
| LPAREN
(* for "(" *)
{
  ParenMissingRparen FakeExpression
}
| LPAREN e=expression 
(* for "(1", "(1+2", "(1+2*3", "((1+2)" *)
{
  ParenMissingRparen e
}
| RPAREN
(* for ")" *)
{
  ExtraRparen FakeExpression
}
| e=expression RPAREN 
(* for "3)", "4))", "2+3)" *)
{
  ExtraRparen e
}

%inline binop:
  PLUS { Add   }
| STAR { Mul   }
| EQ   { Equal }

identifier: x=ID
{
  Id x
}

It works fine on a set of incomplete expressions. However, menhir --explain parser.mly returns the following parser.conflict:

** Conflict (reduce/reduce) in state 10.
** Tokens involved: STAR RPAREN PLUS EQ EOF
** The following explanations concentrate on token STAR.
** This state is reached from expressionEOF after reading:

LPAREN expression RPAREN

** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)

expressionEOF 
expression EOF 
expression STAR expression // lookahead token appears
(?)

** In state 10, looking ahead at STAR, reducing production
** expression -> LPAREN expression RPAREN
** is permitted because of the following sub-derivation:

LPAREN expression RPAREN . 

** In state 10, looking ahead at STAR, reducing production
** expression -> expression RPAREN
** is permitted because of the following sub-derivation:

LPAREN expression // lookahead token is inherited
       expression RPAREN . 

** Conflict (reduce/reduce) in state 3.
** Tokens involved: STAR RPAREN PLUS EQ EOF
** The following explanations concentrate on token STAR.
** This state is reached from expressionEOF after reading:

LPAREN RPAREN

** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)

expressionEOF 
expression EOF 
expression STAR expression // lookahead token appears
(?)

** In state 3, looking ahead at STAR, reducing production
** expression -> LPAREN RPAREN
** is permitted because of the following sub-derivation:

LPAREN RPAREN . 

** In state 3, looking ahead at STAR, reducing production
** expression -> RPAREN
** is permitted because of the following sub-derivation:

LPAREN expression // lookahead token is inherited
       RPAREN .

I don't understand what it tries to explain. Could anyone tell me what may be potential conflicts (with example by preference) and what would be solutions?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

说好的呢 2025-01-19 14:04:23

您有：

expr: '(' expr ')'
    | '(' expr
    |     expr ')'

因此，您希望 ( x ) 匹配第一条规则：

         expr
      -> '('  expr ')'  (rule 1)

它确实如此。但它也以另一种方式匹配：

         expr
      -> expr       ')' (rule 3)
      -> '(' expr   ')' (rule 2)

它也像这样匹配：

         expr
      -> '('   expr     (rule 2)
      -> '('   expr ')' (rule 3)

因为你还让 expr 匹配 ( 和 ), ( ) 还可以通过多种方式进行匹配，包括作为 expr ')' （使用 expr -> '(') 或 '(' expr （带有 expr -> ')'）。

“解决方案”是放弃尝试添加无效句子的识别，一旦语法错误，您可以尝试使用 Menhir 的错误恢复机制来产生错误。消息并继续解析。

You have:

expr: '(' expr ')'
    | '(' expr
    |     expr ')'

So, you want ( x ) to match the first rule:

         expr
      -> '('  expr ')'  (rule 1)

Which it does. But it also matches another way:

         expr
      -> expr       ')' (rule 3)
      -> '(' expr   ')' (rule 2)

And it also matches like this:

         expr
      -> '('   expr     (rule 2)
      -> '('   expr ')' (rule 3)

Since you also let expr match ( and ), ( ) can also be matched several ways, including as expr ')' (with expr -> '('), or '(' expr (with expr -> ')').

The "solution" is to give up trying to add recognition of invalid sentences. The parse should fail on a syntax error; once it fails you can try to use Menhir's error recovery mechanism to produce an error message and continue the parse. See section 11 of the manual.

回复收藏 0 原文

~没有更多了~