我的 ANTLR 词法分析器如何匹配由另一种标记的子集字符组成的标记？

发布于 2024-08-19 23:22:46 字数 798 浏览 7 评论 0原文

我有一个我认为简单的 ANTLR 问题。我有两种令牌类型：ident 和 special_ident。我希望我的 special_ident 匹配单个字母后跟一个数字。我希望通用 ident 匹配单个字母，可以选择后跟任意数量的字母或数字。我的（不正确的）语法如下：

expr 
    : special_ident
    | ident
    ;

special_ident : LETTER DIGIT;
ident         : LETTER (LETTER | DIGIT)*;

LETTER : 'A'..'Z';
DIGIT  : '0'..'9';

当我尝试检查此语法时，我收到此警告：

决策可以使用多种选择来匹配输入，例如“字母数字”：1、2。因此，该输入禁用了替代方案 2

我知道我的语法不明确，并且 A1 等输入可能与 ident 或 special_ident 匹配。我真的只是希望在最狭窄的情况下使用 special_ident 。

以下是一些示例输入以及我希望其匹配的内容：

A      : ident
A1     : special_ident
A1A    : ident
A12    : ident
AA1    : ident

如何形成语法以便正确识别两种类型的标识符？

原文

I have what I think is a simple ANTLR question. I have two token types: ident and special_ident. I want my special_ident to match a single letter followed by a single digit. I want the generic ident to match a single letter, optionally followed by any number of letters or digits. My (incorrect) grammar is below:

expr 
    : special_ident
    | ident
    ;

special_ident : LETTER DIGIT;
ident         : LETTER (LETTER | DIGIT)*;

LETTER : 'A'..'Z';
DIGIT  : '0'..'9';

When I try to check this grammar, I get this warning:

Decision can match input such as "LETTER DIGIT" using multiple alternatives: 1, 2.
As a result, alternative(s) 2 were disabled for that input

I understand that my grammar is ambiguous and that input such as A1 could match either ident or special_ident. I really just want the special_ident to be used in the narrowest of cases.

Here's some sample input and what I'd like it to match:

A      : ident
A1     : special_ident
A1A    : ident
A12    : ident
AA1    : ident

How can I form my grammar such that I correctly identify my two types of identifiers?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

安静 2024-08-26 23:22:46

似乎您有 3 种情况：

A
AN
A(A|N)(A|N)+

您可以将中间的情况分类为 < code>special_ident ，另外两个为 ident；似乎应该可以解决问题。

我对 ANTLR 有点生疏，我希望这个提示足够了。我可以尝试为你写出表达式，但它们可能是错误的：

long_ident    : LETTER (LETTER | DIGIT) (LETTER | DIGIT)+
special_ident : LETTER DIGIT;
ident         : LETTER | long_ident;

Seems that you have 3 cases:

A
AN
A(A|N)(A|N)+

You could classify the middle one as special_ident and the other two as ident; seems that should do the trick.

I'm a bit rusty with ANTLR, I hope this hint is enough. I can try to write out the expressions for you but they could be wrong:

long_ident    : LETTER (LETTER | DIGIT) (LETTER | DIGIT)+
special_ident : LETTER DIGIT;
ident         : LETTER | long_ident;

回复收藏 0 原文

青春有你 2024-08-26 23:22:46

扩展卡尔的想法，我猜你有四种不同的情况：

A
AN
AA(A|N)*
AN(A|N)+

只有选项 2 应该是 tokenspecial_ident，其他三个应该是 ident。所有标记都可以仅通过语法来识别。这是我在 ANTLRWorks 中测试的一个快速语法，它似乎对我来说工作正常。我认为 Carl 在尝试检查 AA 时可能会遇到一个错误，但是让你得到 99% 的结果是一个巨大的好处，所以这只是对他的快速想法的一个小小的修改。

prog 
    :    (expr WS)+ EOF;

expr 
    : special_ident {System.out.println("Found special_ident:" + $special_ident.text + "\n");}
    | ident {System.out.println("Found ident:" + $ident.text + "\n");}
    ;

special_ident : LETTER DIGIT;

ident         : LETTER 
    |LETTER DIGIT (LETTER|DIGIT)+
    |LETTER LETTER (LETTER|DIGIT)*;

LETTER : 'A'..'Z';
DIGIT  : '0'..'9';
WS 
    :   (' '|'\t'|'\n'|'\r')+;

Expanding on Carl's thought, I would guess you have four different cases:

A
AN
AA(A|N)*
AN(A|N)+

Only option 2 should be token special_ident and the other three should be ident. All tokens can be identified by syntax alone. Here is a quick grammar I was able to test in ANTLRWorks and it appeared to work properly for me. I think Carl's might have one bug when trying to check AA , but getting you 99% there is a huge benefit, so this is only a minor modification to his quick thought.

prog 
    :    (expr WS)+ EOF;

expr 
    : special_ident {System.out.println("Found special_ident:" + $special_ident.text + "\n");}
    | ident {System.out.println("Found ident:" + $ident.text + "\n");}
    ;

special_ident : LETTER DIGIT;

ident         : LETTER 
    |LETTER DIGIT (LETTER|DIGIT)+
    |LETTER LETTER (LETTER|DIGIT)*;

LETTER : 'A'..'Z';
DIGIT  : '0'..'9';
WS 
    :   (' '|'\t'|'\n'|'\r')+;

回复收藏 0 原文

~没有更多了~