标记定义中的 JavaCC 操作
我想知道是否可以连接到 JavaCC 的词法分析器来调用函数来检查字符是否有效。
我问的原因是我正在尝试实现一些类似于:
TOKEN {
<ID: id($char)>
}
其中 id() 是:
//Check to see if the character is an ID character
boolean id(char currentCharacter) {
int type = Character.getType(currentCharacter);
return type == Character.LOWERCASE_LETTER || type == Character.MATH_SYMBOL;
}
这可能吗?
I was wondering if it were possible to hook into JavaCC's lexer to call a function to check if a character is valid.
The reason I am asking is I'm trying to implement something a bit like:
TOKEN {
<ID: id($char)>
}
where id() is:
//Check to see if the character is an ID character
boolean id(char currentCharacter) {
int type = Character.getType(currentCharacter);
return type == Character.LOWERCASE_LETTER || type == Character.MATH_SYMBOL;
}
Is this at all possible?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不,你不能。词法分析器是一个有限状态机。
您可以做的是实现一个词法操作,该操作验证匹配字符串的字符并将验证结果添加到颁发的令牌中(例如,通过设置自定义字段的值)。但您不能使用验证结果来指导词法分析器。
您应该将
ID
标记定义为所有可能字符的枚举:注意:如果您不使用 Unicode 转义,请不要忘记告诉 JavaCC 确切的编码你的语法文件。
这很乏味,但这就是词法分析器的工作方式。
另一种方法是接受任何单个字符作为标识符,并在解析器中甚至稍后验证它:
不过,我认为没有理由这样做。
No, you can't. The lexer is a finite state machine.
What you can do is implement a lexical action that validates the characters of the matched string and adds the result of that validation to the issued token (e.g. by setting the value of a custom field). But you cannot use the result of the validation to guide the lexer.
You should define the
ID
token as an enumeration of all the possible characters:Note: If you don't use Unicode escapes, don't forget to tell JavaCC the exact encoding of your grammar file.
This is tedious but it is how the lexer works.
An alternative is to accept any single character as an identifier, and validate it in the parser, or even later:
I see no reason to do that, though.