为什么会发生这个错误？ - “以下选项永远无法匹配”

发布于 2024-11-09 07:32:16 字数 1468 浏览 12 评论 0原文

我对制作编译器、解析器和解析器生成器很感兴趣，但我对它们了解不多。

在阅读了这个问题的答案后，我尝试做出“非常”的回答简单的 LaTeX 解析器。

这是代码：

grammar Latex;

latex   :   ITEM*;
ITEM    :   CMD|LAWTEXT;
CMD :   CHEAD ARGS;
CHEAD   :   '\\' LETTER(LETTER|DIGIT)*;
LETTER  :   'A'..'Z'|'a'..'z';
DIGIT   :   '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';
ARGS    :   '{' ITEM* '}';
LAWTEXT :   (LETTER|DIGIT|WHITESPACE|PUNC)*;
WHITESPACE
    :   ' '|'\t'|'\n'|'\r';
PUNC    :   '!'|'^';

（PUNC 中只有两个字符，用于测试目的）

这是错误消息：

[18:39:09] warning(200): C:\Users\***\Documents\Latex.g:9:12: Decision can match input such as "{'\t'..'\n', '\r', ' '..'!', '0'..'9', 'A'..'Z', '\\', '^', 'a'..'z', '}'}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
[18:39:09] error(201): C:\Users\***\Documents\Latex.g:9:12: The following alternatives can never be matched: 2

[18:39:09] error(211): C:\Users\***\Documents\Latex.g:1:8: [fatal] rule Tokens has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2.  Resolve by left-factoring or using syntactic predicates or using backtrack=true option.

我发现发生此错误是因为存在歧义，代码可以用两种以上的方式解释，但我没有知道这种歧义是如何产生的。

这是图表和可以解释某些事物的两种方式（也许）。

图表。

...但是 \ 和 } 可以如何感到困惑吗？

原文

I'm interested in making compiler, parser, and parser generator, but I don't know much about them.

After reading an answer of this question, I tried to make a 'very' simple LaTeX parser.

This is the code:

grammar Latex;

latex   :   ITEM*;
ITEM    :   CMD|LAWTEXT;
CMD :   CHEAD ARGS;
CHEAD   :   '\\' LETTER(LETTER|DIGIT)*;
LETTER  :   'A'..'Z'|'a'..'z';
DIGIT   :   '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';
ARGS    :   '{' ITEM* '}';
LAWTEXT :   (LETTER|DIGIT|WHITESPACE|PUNC)*;
WHITESPACE
    :   ' '|'\t'|'\n'|'\r';
PUNC    :   '!'|'^';

(There's only two char in PUNC, for test purpose)

And this is the error message:

[18:39:09] warning(200): C:\Users\***\Documents\Latex.g:9:12: Decision can match input such as "{'\t'..'\n', '\r', ' '..'!', '0'..'9', 'A'..'Z', '\\', '^', 'a'..'z', '}'}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
[18:39:09] error(201): C:\Users\***\Documents\Latex.g:9:12: The following alternatives can never be matched: 2

[18:39:09] error(211): C:\Users\***\Documents\Latex.g:1:8: [fatal] rule Tokens has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2.  Resolve by left-factoring or using syntactic predicates or using backtrack=true option.

I found that this error happens because there's an ambiguity, a code can be interpreted in more than two ways, but I have no idea how this ambiguity is generated.

And this is the diagram and two ways something can be interpreted (maybe).

The diagram.

... but how \ and } can be confused?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

不知所踪 2024-11-16 07:32:16

JiminP 写道：
我对制作编译器、解析器和解析器生成器感兴趣，但我对它们了解不多。

ANTLR 根据您编写的语法为您创建词法分析器和解析器。 ANTLR 本身就是解析器生成器，因此您不需要编写解析器生成器本身（幸运的是！）。编译器是一个应用程序，它获取解析器生成的树并将输入转换为其他形式：这是您需要自己做的事情。所以，强调一下：ANTLR 仅帮助您为您的语言创建一个解析器，其余的取决于您。

现在，问题。

您的语法几乎只包含词法分析器规则。 Lexer 规则以大写字母开头，用于标记您的输入源。因此，像这样的规则：

LETTER  :   'A'..'Z'|'a'..'z';
...
LAWTEXT :   (LETTER|DIGIT|WHITESPACE|PUNC)*;

可能会导致词法分析器自行创建一个 LETTER 标记。如果您总是希望将小写或大写 ascii 字母变成 LAWTEXT 标记，那么您需要将 LETTER 设置为如下片段规则：

fragment LETTER  :   'A'..'Z'|'a'..'z';
...
LAWTEXT :   (LETTER|DIGIT|WHITESPACE|PUNC)+;

正如您所看到的，我使用 + 而不是 * 结束 LAWTEXT 规则：您不想创建不包含任何内容（空字符串）的标记。

此外，args、item 和 cmd 不适合词法分析器规则：它们应该是解析器规则。

这是生成词法分析器和解析器而没有任何错误的语法：

grammar Latex;

latex
  :  item* EOF
  ;

item 
  :  cmd
  |  LAWTEXT
  ;

cmd
  :  CHEAD args
  ;

args
  :  '{' item* '}'
  ;

CHEAD 
  :  '\\' LETTER (LETTER | DIGIT)*
  ;  

LAWTEXT
  :  (LETTER | DIGIT | WHITESPACE | PUNC)+
  ;

fragment  
WHITESPACE 
  :  ' ' | '\t' | '\n' | '\r'
  ;

fragment  
PUNC       
  : '!' | '^'
  ;

fragment
LETTER
  :  'A'..'Z' | 'a'..'z'
  ;

fragment
DIGIT
  :  '0'..'9'
  ;

编辑

正如我已经提到的：词法分析器规则以大写字母开头，而解析器规则以小写字母开头。词法分析器有时称为分词器或扫描器，负责分割输入源。输入源一开始只是一个字符流。然后，词法分析器将这些字符分组在一起。因此，给定以下词法分析器规则：

Identifier
  :  (Letter | '_') (Letter | '_' | Digit)*
  ;

Assign
  :  '='
  ;

Number
  :  Digit+ ('.' Digit+)?
  ;

fragment Digit
  :  '0'..'9'
  ;

fragment Letter
  :  'a'..'z' | 'A'..'Z'
  ;

Spaces
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

可以采用如下输入源：

foo = 12.34

词法分析器将其视为：

'f', 'o', 'o', ' ', '=', ' ', '1', '2', '.', '3', '4', EOF

并将创建以下标记：

Identifier "foo"
Assign "="
< code>Number "12.34"

（请注意，没有从空格创建标记：我跳过了这些！）

词法分析器从输入源创建标记后，会将这些标记传递给解析器。赋值解析器规则可能如下所示：

assignment
  :  Identifier Assign Number
  ;

重要的是要记住，输入源首先由词法分析器进行标记，只有在该过程之后，解析器规则才会发挥作用。

JiminP wrote:
I'm interested in making compiler, parser, and parser generator, but I don't know much about them.

ANTLR creates a lexer and parser for you based on the grammar you write. ANTLR itself is the parser generator, so you don't write the parser generator itself (luckily!). A compiler is an application that takes the tree your parser generates and translates the input into some other form: this is something you need to do yourself. So, to emphasize: ANTLR only helps you create a parser for your language, the rest is up to you.

Now, the problem(s).

Your grammar contains almost only lexer rules. Lexer rules start with a capital letter and are used to tokenize your input source. So rules like these:

LETTER  :   'A'..'Z'|'a'..'z';
...
LAWTEXT :   (LETTER|DIGIT|WHITESPACE|PUNC)*;

might cause the lexer to create a LETTER token on its own. If you always want a lower- or uppercase ascii letter to become a LAWTEXT token, then you need to make LETTER a fragment rule like this:

fragment LETTER  :   'A'..'Z'|'a'..'z';
...
LAWTEXT :   (LETTER|DIGIT|WHITESPACE|PUNC)+;

And as you can see, I ended the LAWTEXT rule with a + instead of a *: you don't want to create tokens that contain nothing (empty string).

Also, args, item and cmd are no suitable candidates for lexer rule: they should be parser rules instead.

Here's a grammar that generates a lexer and parser without any errors:

grammar Latex;

latex
  :  item* EOF
  ;

item 
  :  cmd
  |  LAWTEXT
  ;

cmd
  :  CHEAD args
  ;

args
  :  '{' item* '}'
  ;

CHEAD 
  :  '\\' LETTER (LETTER | DIGIT)*
  ;  

LAWTEXT
  :  (LETTER | DIGIT | WHITESPACE | PUNC)+
  ;

fragment  
WHITESPACE 
  :  ' ' | '\t' | '\n' | '\r'
  ;

fragment  
PUNC       
  : '!' | '^'
  ;

fragment
LETTER
  :  'A'..'Z' | 'a'..'z'
  ;

fragment
DIGIT
  :  '0'..'9'
  ;

EDIT

As I already mentioned: lexer rules start with a capital letter while parser rules start with a lower case letter. A lexer, which is sometimes called a tokenizer or scanner, is responsible for chopping up the input source. The input source starts of as being just a stream of characters. These characters are then grouped together by the lexer. So, given the following lexer rules:

Identifier
  :  (Letter | '_') (Letter | '_' | Digit)*
  ;

Assign
  :  '='
  ;

Number
  :  Digit+ ('.' Digit+)?
  ;

fragment Digit
  :  '0'..'9'
  ;

fragment Letter
  :  'a'..'z' | 'A'..'Z'
  ;

Spaces
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

could take input source like:

foo = 12.34

which the lexer sees as:

'f', 'o', 'o', ' ', '=', ' ', '1', '2', '.', '3', '4', EOF

and will create the following tokens:

Identifier "foo"
Assign "="
Number "12.34"

(note that there are no tokens being created from the white spaces: I skipped these!)

After the lexer has created tokens from your input source, the parser is passed these tokens. An assignment parser rule could then look like:

assignment
  :  Identifier Assign Number
  ;

It is important to keep in mind that the input source is first tokenized by the lexer and only after that process, the parser rules come into play.

回复收藏 0 原文

~没有更多了~

关于作者

孤单情人

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

为什么会发生这个错误？ - “以下选项永远无法匹配”

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

编辑

EDIT

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

为什么会发生这个错误？ - “以下选项永远无法匹配”

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

编辑

EDIT

关于作者

相关话题

热门标签

推荐作者

李珊平

Quxin

范无咎

github_ZOJ2N8YxBm

若言

南…巷孤猫

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。