antlr不匹配正确的解析器规则

发布于 2025-01-25 08:15:28 字数 1112 浏览 1 评论 0原文

我正在尝试为旧的Eliza Chatbot程序的医生脚本创建一个解析器。

医生脚本在此处简化为简单的欢迎线，然后定义“医生”如何响应用户的输入“如果我很瘦”：

(I AM THE DOCTOR.)
(IF 3 ((0 IF 0)(DO YOU 3)(YOU WISH THAT 3)))

这是lexer：

ALL_CHARS: [0-9A-Z, .];
KEY_CHARS: [A-Z];
LPAREN: '(';
RPAREN: ')';
NUM: [0-9];
SPACE: ' ';
WS: ('\n')+ -> skip;

和解析器：

main: item* EOF;
item: (rWelcome | rKeyDecompReAssy );


rKeyDecompReAssy: LPAREN rKeyPri rDecompReAssy RPAREN;
rKeyPri: rKey SPACE rPri;
rKey: KEY_CHARS+;
rPri: NUM+;

rDecompReAssy: LPAREN rDecomp rReAssyList RPAREN;
rDecomp: LPAREN ALL_CHARS+ RPAREN;

rReAssyList: (rReAssy)+;
rReAssy: LPAREN reAssy RPAREN;
reAssy: ALL_CHARS+;

rWelcome: LPAREN reAssy RPAREN;

定义的规则欢迎 line（ rwelcome ），如果 line（ rdecompreassy ），则试图匹配4个组件：key，pri，Decomp和Reassylist。

我使用Android Studio的ANTLR预览。

问题在于，这两条线都与rwelcome匹配。

当然，欢迎线还可以，但是第二个错误消息是：

line 2:6 missing ')' at '('
line 2:45 mismatched input ')' expecting {<EOF>, '('}

我如何使这两个规则明确？

原文

I am trying to create a parser for the DOCTOR script of the old ELIZA chatbot program.

The DOCTOR script, simplified here to a simple Welcome line followed by a line defining how "The Doctor" responds to a User input of say "IF ONLY I WAS THINNER" :

(I AM THE DOCTOR.)
(IF 3 ((0 IF 0)(DO YOU 3)(YOU WISH THAT 3)))

Here is the Lexer:

ALL_CHARS: [0-9A-Z, .];
KEY_CHARS: [A-Z];
LPAREN: '(';
RPAREN: ')';
NUM: [0-9];
SPACE: ' ';
WS: ('\n')+ -> skip;

and the Parser:

main: item* EOF;
item: (rWelcome | rKeyDecompReAssy );


rKeyDecompReAssy: LPAREN rKeyPri rDecompReAssy RPAREN;
rKeyPri: rKey SPACE rPri;
rKey: KEY_CHARS+;
rPri: NUM+;

rDecompReAssy: LPAREN rDecomp rReAssyList RPAREN;
rDecomp: LPAREN ALL_CHARS+ RPAREN;

rReAssyList: (rReAssy)+;
rReAssy: LPAREN reAssy RPAREN;
reAssy: ALL_CHARS+;

rWelcome: LPAREN reAssy RPAREN;

which defines a rule for the Welcome line (rWelcome) and one for the IF line (rDecompReAssy), which attempts to match 4 components: Key, Pri, Decomp and ReAssyList.

I use the ANTLR Preview of Android Studio.

The problem is that both lines are matched to rWelcome.

The Welcome line is OK of course, but the error message for the second is:

line 2:6 missing ')' at '('
line 2:45 mismatched input ')' expecting {<EOF>, '('}

How do I make the two rules unambiguous?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

简单爱 2025-02-01 08:15:28

如注释中所述，您的Lexer永远不会创建key_chars - ，space - 和num -tokens。这是因为all_chars令牌也与这些令牌中定义的字符匹配。当两个或多个Lexer规则与相同的字符匹配时，一个定义的首先“胜利”。无论解析器规则是否试图匹配key_chars令牌，Lexer都会仅创建all_chars token：lexer与解析器独立地工作。

您能做的就是这样的事情：

main             : item* EOF;
item             : (rWelcome | rKeyDecompReAssy );
rKeyDecompReAssy : LPAREN rKeyPri rDecompReAssy RPAREN;
rKeyPri          : rKey SPACE rPri SPACE; // Note: I added the last `SPACE`
rKey             : KEY_CHARS+;
rPri             : NUM+;
rDecompReAssy    : LPAREN rDecomp rReAssyList RPAREN;
rDecomp          : LPAREN all_chars+ RPAREN;
rReAssyList      : (rReAssy)+;
rReAssy          : LPAREN reAssy RPAREN;
reAssy           : all_chars+;
rWelcome         : LPAREN reAssy RPAREN;
all_chars        : NUM | KEY_CHARS | SPACE | OTHER_CHAR;

KEY_CHARS  : [A-Z];
LPAREN     : '(';
RPAREN     : ')';
NUM        : [0-9];
SPACE      : ' ';
WS         : ('\n')+ -> skip;
OTHER_CHAR : [.,];

As mentioned in the comment, your lexer never creates KEY_CHARS-, SPACE- and NUM-tokens. This is because the ALL_CHARS token also matches the chars defined in those tokens. And when 2 or more lexer rules match the same characters, the one defined first "wins". No matter if a parser rule is trying to match a KEY_CHARS token, the lexer simply creates a ALL_CHARS token: the lexer works independently from the parser.

What you could do is something like this:

main             : item* EOF;
item             : (rWelcome | rKeyDecompReAssy );
rKeyDecompReAssy : LPAREN rKeyPri rDecompReAssy RPAREN;
rKeyPri          : rKey SPACE rPri SPACE; // Note: I added the last `SPACE`
rKey             : KEY_CHARS+;
rPri             : NUM+;
rDecompReAssy    : LPAREN rDecomp rReAssyList RPAREN;
rDecomp          : LPAREN all_chars+ RPAREN;
rReAssyList      : (rReAssy)+;
rReAssy          : LPAREN reAssy RPAREN;
reAssy           : all_chars+;
rWelcome         : LPAREN reAssy RPAREN;
all_chars        : NUM | KEY_CHARS | SPACE | OTHER_CHAR;

KEY_CHARS  : [A-Z];
LPAREN     : '(';
RPAREN     : ')';
NUM        : [0-9];
SPACE      : ' ';
WS         : ('\n')+ -> skip;
OTHER_CHAR : [.,];

回复收藏 0 原文

~没有更多了~