我的 ANTLR 词法分析器如何匹配由另一种标记的子集字符组成的标记?
我有一个我认为简单的 ANTLR 问题。我有两种令牌类型:ident
和 special_ident
。我希望我的 special_ident
匹配单个字母后跟一个数字。我希望通用 ident
匹配单个字母,可以选择后跟任意数量的字母或数字。我的(不正确的)语法如下:
expr
: special_ident
| ident
;
special_ident : LETTER DIGIT;
ident : LETTER (LETTER | DIGIT)*;
LETTER : 'A'..'Z';
DIGIT : '0'..'9';
当我尝试检查此语法时,我收到此警告:
决策可以使用多种选择来匹配输入,例如“字母数字”:1、2。 因此,该输入禁用了替代方案 2
我知道我的语法不明确,并且 A1
等输入可能与 ident
或 special_ident 匹配
。我真的只是希望在最狭窄的情况下使用 special_ident
。
以下是一些示例输入以及我希望其匹配的内容:
A : ident
A1 : special_ident
A1A : ident
A12 : ident
AA1 : ident
如何形成语法以便正确识别两种类型的标识符?
I have what I think is a simple ANTLR question. I have two token types: ident
and special_ident
. I want my special_ident
to match a single letter followed by a single digit. I want the generic ident
to match a single letter, optionally followed by any number of letters or digits. My (incorrect) grammar is below:
expr
: special_ident
| ident
;
special_ident : LETTER DIGIT;
ident : LETTER (LETTER | DIGIT)*;
LETTER : 'A'..'Z';
DIGIT : '0'..'9';
When I try to check this grammar, I get this warning:
Decision can match input such as "LETTER DIGIT" using multiple alternatives: 1, 2.
As a result, alternative(s) 2 were disabled for that input
I understand that my grammar is ambiguous and that input such as A1
could match either ident
or special_ident
. I really just want the special_ident
to be used in the narrowest of cases.
Here's some sample input and what I'd like it to match:
A : ident
A1 : special_ident
A1A : ident
A12 : ident
AA1 : ident
How can I form my grammar such that I correctly identify my two types of identifiers?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
似乎您有 3 种情况:
A
AN
A(A|N)(A|N)+
您可以将中间的情况分类为 < code>special_ident ,另外两个为
ident
;似乎应该可以解决问题。我对 ANTLR 有点生疏,我希望这个提示足够了。我可以尝试为你写出表达式,但它们可能是错误的:
Seems that you have 3 cases:
A
AN
A(A|N)(A|N)+
You could classify the middle one as
special_ident
and the other two asident
; seems that should do the trick.I'm a bit rusty with ANTLR, I hope this hint is enough. I can try to write out the expressions for you but they could be wrong:
扩展卡尔的想法,我猜你有四种不同的情况:
只有选项 2 应该是 tokenspecial_ident,其他三个应该是 ident。所有标记都可以仅通过语法来识别。这是我在 ANTLRWorks 中测试的一个快速语法,它似乎对我来说工作正常。我认为 Carl 在尝试检查 AA 时可能会遇到一个错误,但是让你得到 99% 的结果是一个巨大的好处,所以这只是对他的快速想法的一个小小的修改。
Expanding on Carl's thought, I would guess you have four different cases:
Only option 2 should be token special_ident and the other three should be ident. All tokens can be identified by syntax alone. Here is a quick grammar I was able to test in ANTLRWorks and it appeared to work properly for me. I think Carl's might have one bug when trying to check AA , but getting you 99% there is a huge benefit, so this is only a minor modification to his quick thought.