JavaCC警告“正则表达式选择:FOO永远不能匹配为:BAR”的解释和解决方案?
我正在自学在一个业余爱好项目中使用 JavaCC,并有一个简单的语法来编写解析器。 解析器的一部分包括以下内容:
TOKEN : { < DIGIT : (["0"-"9"]) > }
TOKEN : { < INTEGER : (<DIGIT>)+ > }
TOKEN : { < INTEGER_PAIR : (<INTEGER>){2} > }
TOKEN : { < FLOAT : (<NEGATE>)? <INTEGER> | (<NEGATE>)? <INTEGER> "." <INTEGER> | (<NEGATE>)? <INTEGER> "." | (<NEGATE>)? "." <INTEGER> > }
TOKEN : { < FLOAT_PAIR : (<FLOAT>){2} > }
TOKEN : { < NUMBER_PAIR : <FLOAT_PAIR> | <INTEGER_PAIR> > }
TOKEN : { < NEGATE : "-" > }
使用 JavaCC 进行编译时,我得到输出:
Warning: Regular Expression choice : FLOAT_PAIR can never be matched as : NUMBER_PAIR
Warning: Regular Expression choice : INTEGER_PAIR can never be matched as : NUMBER_PAIR
我确信这是一个简单的概念,但我不理解该警告,因为我是解析器生成和正则表达式方面的新手。
这个警告是什么意思(用新手术语来说)?
I am teaching myself to use JavaCC in a hobby project, and have a simple grammar to write a parser for. Part of the parser includes the following:
TOKEN : { < DIGIT : (["0"-"9"]) > }
TOKEN : { < INTEGER : (<DIGIT>)+ > }
TOKEN : { < INTEGER_PAIR : (<INTEGER>){2} > }
TOKEN : { < FLOAT : (<NEGATE>)? <INTEGER> | (<NEGATE>)? <INTEGER> "." <INTEGER> | (<NEGATE>)? <INTEGER> "." | (<NEGATE>)? "." <INTEGER> > }
TOKEN : { < FLOAT_PAIR : (<FLOAT>){2} > }
TOKEN : { < NUMBER_PAIR : <FLOAT_PAIR> | <INTEGER_PAIR> > }
TOKEN : { < NEGATE : "-" > }
When compiling with JavaCC I get the output:
Warning: Regular Expression choice : FLOAT_PAIR can never be matched as : NUMBER_PAIR
Warning: Regular Expression choice : INTEGER_PAIR can never be matched as : NUMBER_PAIR
I'm sure this is a simple concept but I don't understand the warning, being a novice in both parser generation and regular expressions.
What does this warning mean (in as-novice-as-you-can-get terms)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我不懂JavaCC,但我是一名编译器工程师。
FLOAT_PAIR
规则不明确。 考虑以下文本:这可能是
FLOAT 0
后跟FLOAT .0
; 或者可以是FLOAT 0.
后跟FLOAT 0
; 两者都会产生 FLOAT_PAIR。 或者它可以是单个 FLOAT0.0
。但更重要的是,您正在以一种永远不可能起作用的方式使用词法分析和组合。 考虑这个数字:
这可以被解析为
INTEGER 12, INTEGER 345
,从而产生INTEGER_PAIR
。 或者它可以被解析为INTEGER 123、INTEGER 45
、另一个INTEGER_PAIR
。 或者它可能是另一个令牌INTEGER 12345
。 之所以存在此问题,是因为您不需要INTEGER_PAIR
(或FLOAT_PAIR
)的词汇元素之间存在空格。您几乎不应该尝试在词法分析器中处理这样的对。 相反,您应该将纯数字(
INTEGER
和FLOAT
)作为标记来处理,并在解析器中处理诸如否定和配对之类的事情,其中空格已被处理和剥离。(例如,您将如何处理
"----42"
?这是大多数编程语言中的有效表达式,它将正确计算多个否定,但您的词法分析器不会处理.)另外,请注意词法分析器中的个位数整数不会被匹配为
INTEGER
,它们将显示为DIGIT
。 不过,我不知道 JavaCC 的正确语法可以为您解决这个问题。 您想要的是将DIGIT
定义为不是标记,而只是可以在其他标记的定义中使用的东西; 或者,无论您在规则中使用DIGIT
的位置,直接嵌入DIGIT
([0-9]
) 的定义。I don't know JavaCC, but I am a compiler engineer.
The
FLOAT_PAIR
rule is ambiguous. Consider the following text:This could be
FLOAT 0
followed byFLOAT .0
; or it could beFLOAT 0.
followed byFLOAT 0
; both resulting in FLOAT_PAIR. Or it could be a single FLOAT0.0
.More importantly, though, you are using lexical analysis with composition in a way that is never likely to work. Consider this number:
This could be parsed as
INTEGER 12, INTEGER 345
resulting in anINTEGER_PAIR
. Or it could be parsed asINTEGER 123, INTEGER 45
, anotherINTEGER_PAIR
. Or it could beINTEGER 12345
, another token. The problem exists because you are not requiring white space between the lexical elements of theINTEGER_PAIR
(orFLOAT_PAIR
).You should almost never try to handle pairs like this in the lexer. Instead, you should handle plain numbers (
INTEGER
andFLOAT
) as tokens, and handle things like negation and pairing in the parser, where whitespace has been dealt with and stripped.(For example, how are you going to process
"----42"
? This is a valid expression in most programming languages, which will correctly calculate multiple negations, but would not be handled by your lexer.)Also, be aware that single-digit integers in your lexer will not be matched as
INTEGER
, they will come out asDIGIT
. I don't know the correct syntax for JavaCC to fix that for you, though. What you want is to defineDIGIT
not as a token, but simply something you can use in the definitions of other tokens; alternatively, embed the definition ofDIGIT
([0-9]
) directly wherever you are usingDIGIT
in your rules.我没有使用过JavaCC,但是NUMBER_PAIR可能是不明确的。
我认为问题归结为这样一个事实:FLOAT_PAIR 和 INTEGER_PAIR 可以匹配相同的事物,因为 FLOAT 可以匹配 INTEGER。
但这只是从未见过 JavaCC 语法的猜测:)
I haven't used JavaCC, but it is possible that NUMBER_PAIR is ambiguous.
I think the problem comes down to the fact that the same exact thing can be matched as both FLOAT_PAIR and INTEGER_PAIR since FLOAT can match an INTEGER.
But this is just a guess having never seen the JavaCC syntax :)
这可能意味着对于每个
FLOAT_PAIR
,您只会获得一个FLOAT_PAIR
令牌,而不是NUMBER_PAIR
令牌。FLOAT_PAIR
规则已匹配所有输入,JavaCC 不会尝试查找进一步的匹配规则。 这是我的解释,但我不了解 JavaCC,所以要持保留态度。也许您可以以某种方式指定
NUMBER_PAIR
是主要产品,并且您不希望获得任何其他令牌作为结果。It probably means that for every
FLOAT_PAIR
you'll just get aFLOAT_PAIR
token, never aNUMBER_PAIR
token. TheFLOAT_PAIR
rule already matches all the input and JavaCC will not try to find further matching rules. That would be my interpretation, but I don't know JavaCC, so take it with a grain of salt.Maybe you can specify somehow that
NUMBER_PAIR
is the main production and that you don't want to get any other tokens as results.感谢 Barry Kelly 的回答,我提出的解决方案是:
我完全忘记了包含用于分隔两个标记的空格,我还使用了“#”符号来停止标记匹配,并且仅用于其他标记的定义中。 以上是JavaCC编译的,没有警告或错误。
然而,正如巴里指出的那样,有一些理由反对这样做。
Thanks to Barry Kelly's answer, the solution I've come up with is:
I had completely forgot to include the space which is used to separate the two tokens, I've also used the '#' symbol which stops the tokens being matched, and is just used in the definition of other tokens. The above is compiled by JavaCC without warning or error.
However, as noted by Barry, there are reasons against doing this.