我的 ANTLR 词法分析器如何匹配由另一种标记的子集字符组成的标记?

发布于 2024-08-19 23:22:46 字数 798 浏览 3 评论 0原文

我有一个我认为简单的 ANTLR 问题。我有两种令牌类型:identspecial_ident。我希望我的 special_ident 匹配单个字母后跟一个数字。我希望通用 ident 匹配单个字母,可以选择后跟任意数量的字母或数字。我的(不正确的)语法如下:

expr 
    : special_ident
    | ident
    ;

special_ident : LETTER DIGIT;
ident         : LETTER (LETTER | DIGIT)*;

LETTER : 'A'..'Z';
DIGIT  : '0'..'9';

当我尝试检查此语法时,我收到此警告:

决策可以使用多种选择来匹配输入,例如“字母数字”:1、2。 因此,该输入禁用了替代方案 2

我知道我的语法不明确,并且 A1 等输入可能与 identspecial_ident 匹配。我真的只是希望在最狭窄的情况下使用 special_ident

以下是一些示例输入以及我希望其匹配的内容:

A      : ident
A1     : special_ident
A1A    : ident
A12    : ident
AA1    : ident

如何形成语法以便正确识别两种类型的标识符?

I have what I think is a simple ANTLR question. I have two token types: ident and special_ident. I want my special_ident to match a single letter followed by a single digit. I want the generic ident to match a single letter, optionally followed by any number of letters or digits. My (incorrect) grammar is below:

expr 
    : special_ident
    | ident
    ;

special_ident : LETTER DIGIT;
ident         : LETTER (LETTER | DIGIT)*;

LETTER : 'A'..'Z';
DIGIT  : '0'..'9';

When I try to check this grammar, I get this warning:

Decision can match input such as "LETTER DIGIT" using multiple alternatives: 1, 2.
As a result, alternative(s) 2 were disabled for that input

I understand that my grammar is ambiguous and that input such as A1 could match either ident or special_ident. I really just want the special_ident to be used in the narrowest of cases.

Here's some sample input and what I'd like it to match:

A      : ident
A1     : special_ident
A1A    : ident
A12    : ident
AA1    : ident

How can I form my grammar such that I correctly identify my two types of identifiers?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

安静 2024-08-26 23:22:46

似乎您有 3 种情况:

  • A
  • AN
  • A(A|N)(A|N)+

您可以将中间的情况分类为 < code>special_ident ,另外两个为 ident;似乎应该可以解决问题。

我对 ANTLR 有点生疏,我希望这个提示足够了。我可以尝试为你写出表达式,但它们可能是错误的:

long_ident    : LETTER (LETTER | DIGIT) (LETTER | DIGIT)+
special_ident : LETTER DIGIT;
ident         : LETTER | long_ident;

Seems that you have 3 cases:

  • A
  • AN
  • A(A|N)(A|N)+

You could classify the middle one as special_ident and the other two as ident; seems that should do the trick.

I'm a bit rusty with ANTLR, I hope this hint is enough. I can try to write out the expressions for you but they could be wrong:

long_ident    : LETTER (LETTER | DIGIT) (LETTER | DIGIT)+
special_ident : LETTER DIGIT;
ident         : LETTER | long_ident;
青春有你 2024-08-26 23:22:46

扩展卡尔的想法,我猜你有四种不同的情况:

  1. A
  2. AN
  3. AA(A|N)*
  4. AN(A|N)+

只有选项 2 应该是 tokenspecial_ident,其他三个应该是 ident。所有标记都可以仅通过语法来识别。这是我在 ANTLRWorks 中测试的一个快速语法,它似乎对我来说工作正常。我认为 Carl 在尝试检查 AA 时可能会遇到一个错误,但是让你得到 99% 的结果是一个巨大的好处,所以这只是对他的快速想法的一个小小的修改。

prog 
    :    (expr WS)+ EOF;

expr 
    : special_ident {System.out.println("Found special_ident:" + $special_ident.text + "\n");}
    | ident {System.out.println("Found ident:" + $ident.text + "\n");}
    ;

special_ident : LETTER DIGIT;

ident         : LETTER 
    |LETTER DIGIT (LETTER|DIGIT)+
    |LETTER LETTER (LETTER|DIGIT)*;

LETTER : 'A'..'Z';
DIGIT  : '0'..'9';
WS 
    :   (' '|'\t'|'\n'|'\r')+;

Expanding on Carl's thought, I would guess you have four different cases:

  1. A
  2. AN
  3. AA(A|N)*
  4. AN(A|N)+

Only option 2 should be token special_ident and the other three should be ident. All tokens can be identified by syntax alone. Here is a quick grammar I was able to test in ANTLRWorks and it appeared to work properly for me. I think Carl's might have one bug when trying to check AA , but getting you 99% there is a huge benefit, so this is only a minor modification to his quick thought.

prog 
    :    (expr WS)+ EOF;

expr 
    : special_ident {System.out.println("Found special_ident:" + $special_ident.text + "\n");}
    | ident {System.out.println("Found ident:" + $ident.text + "\n");}
    ;

special_ident : LETTER DIGIT;

ident         : LETTER 
    |LETTER DIGIT (LETTER|DIGIT)+
    |LETTER LETTER (LETTER|DIGIT)*;

LETTER : 'A'..'Z';
DIGIT  : '0'..'9';
WS 
    :   (' '|'\t'|'\n'|'\r')+;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文