根据当前行号有条件跳过ANTLR Lexer规则
我的antlr lexer语法中有这对规则,它符合相同的模式,但具有相互排斥的谓词:
MAGIC: '#' ~[\r\n]* {getLine() == 1}? ;
HASH_COMMENT: '#' ~[\r\n]* {getLine() != 1}? -> skip;
当我查看antlr预览中的令牌时,我会看到:
谓词没有被使用,无论我在哪种线上,令牌都以魔力而出现。
我还尝试了另一种方法来尝试解决这个问题:
tokens { MAGIC }
HASH_COMMENT: '#' ~[\r\n]* {if (getLine() == 1) setType(MAGIC); else skip();};
但是现在,两者都以hash_comment出现:
首次尝试使用两个谓词起作用,所以这很令人惊讶,但是现在很令人惊讶似乎该动作也不起作用,这更奇怪。
我如何做这项工作?
我宁愿不尝试将“ #usda ...”匹配为一个不同的令牌,因为该评论可能会在文件下进一步出现,除非在第一行,否则应将其视为正常评论。
I have this pair of rules in my ANTLR lexer grammar, which match the same pattern, but have mutually exclusive predicates:
MAGIC: '#' ~[\r\n]* {getLine() == 1}? ;
HASH_COMMENT: '#' ~[\r\n]* {getLine() != 1}? -> skip;
When I look at the tokens in the ANTLR Preview, I see:
So it seems like the predicate isn't being used, and regardless of the line I'm on, the token comes out as MAGIC.
I also tried a different approach to try and work around this:
tokens { MAGIC }
HASH_COMMENT: '#' ~[\r\n]* {if (getLine() == 1) setType(MAGIC); else skip();};
But now, both come out as HASH_COMMENT:
I really expected the first attempt using two predicates to work, so that was surprising, but now it seems like the action doesn't work either, which is even more odd.
How do I make this work?
I'd rather not try to match "#usda ..." as a different token because that comment could occur further down the file, and it should be treated as a normal comment unless it's on the first line.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不会尝试在解析步骤中强制使用语义。字母组合是 HASH_COMMENT、句点。
相反,我会将其作为正常语法处理,并处理解析后的步骤中可能需要的任何特殊内容。例如:
通过这种方式,您可以在任何内容之前定义一个可能的 HASH_COMMENT(稍后您可以将其解释为 MAGIC,而不使用此类标记类型)。可能不是第一行,但在其他任何内容之前(这更好地类似于真实文档,您可以在散列注释之前有空格)。
I would not try to force semantics in the parse step. The letter combination is a HASH_COMMENT, period.
Instead I would handle that as normal syntax and handle anything special you might need in the step after parsing. For example:
This way you define a possible HASH_COMMENT (which you might interpret as MAGIC later, without using such a token type) before any content. Might not be line one, but before anything else (which resembles real document better, where you can have whitespaces before your hash comment).