如何附加两个词法分析器表达式 - ANTLR4

发布于 2025-01-11 03:07:15 字数 295 浏览 2 评论 0原文

我需要词法分析器将两个不同的字符表达式解析为一个表达式。

所以我有这样的东西,

rootPath : 'A' rootType SEP childPath; //我的输出应该是AB:2或AC:4


childPath : RESERVED_NUMBERS;

根类型:ONE_LETTER;


九月: ':' RESERVED_NUMBERS:[1-9] ONE_LETTER :[AZ]

我在解析此内容时遇到错误,如何将 'A' 和 ONE_LETTER 组合成单个字符串

I need lexer to parse two different character expressions as one expression.

So I've something like this,

rootPath : 'A' rootType SEP childPath; //my output should be AB:2 or AC:4


childPath : RESERVED_NUMBERS;

rootType : ONE_LETTER;


SEP: ':'
RESERVED_NUMBERS :[1-9]
ONE_LETTER : [A-Z]

I'm getting error when I'm parsing this, How can I combine 'A' and ONE_LETTER into single string

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

枯寂 2025-01-18 03:07:15

根据您的评论,您似乎希望将根级别和子级别的两个字母保留为单独的标记,但存在“冲突”(在您的 rootLevel 解析器规则中),您的 'A ' 标记文字和您的 ONE_LETTER 规则均与“A”字符匹配。如果我是对的,那么你并不是真正的“附加 Lexer 表达式”。

重要的是要认识到语法中的“A”只是定义 Lexer 规则的语法快捷方式(ANTLR 将使用类似 T__0 的名称创建它),因此它只是另一个 Lexer 规则。

了解输入字符流用于创建供解析器使用的令牌流也很重要。解析器规则无法控制“A”是否匹配 T__0 ('A') 规则或 ONE_LETTER 规则。该决定是由 Tokenizer 做出的,它必须仅查看输入字符流来选择一个。

考虑到这一点,您可能不应该尝试对抗词法分析器,而应该允许两个字符都被识别为 ONE_LETTER 标记,并向您的 rootPath 规则添加语义谓词:

rootPath
    : rootLevel = 'A' {$rootLevel.text == "A" }? subLevel = ONE_LETTER SEP childPath
    ; //my output should be AB:2 or AC:4

现在,仅当 rootLevel ONE_LETTER 标记为“A”时,rootPath 规则才会匹配,并且您将拥有 rootLevelRootPathContext 类中的 subLevel 字段。

Based upon your comments, it appears that you want to keep the two letters of your root level and sub level as separate tokens, but have a "conflict" (in you rootLevel parser rule) that your 'A' token literal and your ONE_LETTER rule both match the "A" character. If I have this right, you're not really "appending Lexer expressions".

It's important to recognize that the 'A' in your grammar is just a syntactic shortcut for defining a Lexer rule (ANTLR will create it with a name something like T__0), so it's just another Lexer rule.

It's also important to understand that you stream of input characters are used to create a stream of Tokens to be used by the parser. There is nothing that a parser rule can do to control whether "A" matches the T__0 ('A') rule or the ONE_LETTER rule. That decision was made by the Tokenizer, and it has to pick one just looking at the stream of input characters.

With that in mind, you should probably not try to fight the Lexer, but allow both characters to be recognized as ONE_LETTER tokens, and add a semantic predicate to your rootPath rule:

rootPath
    : rootLevel = 'A' {$rootLevel.text == "A" }? subLevel = ONE_LETTER SEP childPath
    ; //my output should be AB:2 or AC:4

now the rootPath rule will only match if the rootLevel ONE_LETTER token is an "A", and you will have rootLevel and subLevel fields in your RootPathContext class.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文