噪声数据流上的 ANTLR 第 3 部分
仍在学习 ANTLR 的过程中......最近我发布了 2 个关于解析一些文本和提取信息的问题,留下“不需要的”单词或字符。与 Bart Kiers 进行了一次非常有趣的讨论解析嘈杂的数据流第 1 部分 并解析嘈杂的数据流第2部分,我要结束了还有一个问题...
最初,我的语法看起来像这样
VERB : 'SLEEPING' | 'WALKING';
SUBJECT : 'CAT'|'DOG'|'BIRD';
INDIRECT_OBJECT : 'CAR'| 'SOFA';
ANY2 :'A'..'Z'+ {skip();};
ANY : . {skip();};
parse
: sentenceParts+ EOF
;
sentenceParts
: SUBJECT VERB INDIRECT_OBJECT
;
一个句子 it's 10PM and the Lazy CAT is current is Sleeping Deep on the SOFA in the front of TV.
将生成以下
这很好......它满足了我的要求,即仅提取单词 CAT
,SLEEPING
和SOFA
,抛开其他词不谈。现在,由于另一个原因,我需要在语法中引入一个新标记,我们将其称为 OTHER : 'PLANE'
。稍后它将被另一个规则使用。我仍然希望我的主要规则起作用:SUBJECT VERB INDIRECT_OBJECT
。假设标记 'PLANE'
出现在我的句子中,例如
现在是晚上 10 点,飞机上的懒猫目前正在电视机前的沙发上沉沉地睡着.
它将产生以下错误(这并不奇怪,因为词法分析器对“PLANE”有明确的定义作为标记)
有没有办法告诉 ANTLR,如果我输入规则 sentenceParts
,我只关心我定义的 3 个标记,即 SUBJECT
、VERB 或 INDIRECT_OBJECT
并且,即使遇到不同的令牌,也不考虑它?我希望能够做到这一点,而无需在此规则中的任何地方放置 OTHER?
Still in the process of learning ANTLR... Recently I have been posting 2 questions regarding parsing some text and extracting information leaving aside "unwanted" words or character. Following a very interesing discussion with Bart Kiers on parsing a noisy datastream Part 1 and and parsing a noisy datastream Part 2, I'm ending up with one more problem...
Originally, my grammar looks like this
VERB : 'SLEEPING' | 'WALKING';
SUBJECT : 'CAT'|'DOG'|'BIRD';
INDIRECT_OBJECT : 'CAR'| 'SOFA';
ANY2 :'A'..'Z'+ {skip();};
ANY : . {skip();};
parse
: sentenceParts+ EOF
;
sentenceParts
: SUBJECT VERB INDIRECT_OBJECT
;
a sentence like it's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV.
will produce the following
This is good... and it does what I want, i.e. extracting only the word CAT
, SLEEPING
and SOFA
, leaving aside other words. Now, for another reason, I need to introduce a new token in my grammar, let's call it OTHER : 'PLANE'
. It will be used later by another rule. I still want my primary rule to work : SUBJECT VERB INDIRECT_OBJECT
. Let's say the token 'PLANE'
appears in my sentence, like
it's 10PM and the Lazy CAT on the PLANE is currently SLEEPING heavily on the SOFA in front of the TV.
It will produce the following error (no surprise here as the lexer has a clear definition of 'PLANE' as a token)
Is there a way to tell ANTLR that if I'm entering the rule sentenceParts
I only care about the 3 tokens I have defined, namely SUBJECT
, VERB
or INDIRECT_OBJECT
and that, even if it comes across a different token, not to take it into account ? I would like to be able to do that without putting OTHER?
everywhere in this rule
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
好吧,事实上,我可能已经找到了一种方法来做到这一点......尽管如果您不想解析令牌,那么在这一点上引入令牌是有问题的,但这个解决方案是有效的:
this will produce on the following sentence
it's 10PM and the Lazy CAT on the BEAUTIFUL PLANE is currently SLEEPING HEAVILLY on the SOFA in front of the TV
the following tree... So that intermediary tokenWell in fact, I might have found a way to do it... Although it's questionable at that point to introduce tokens if you don't want to parse them, this solution works :
this will produce on the following sentence
it's 10PM and the Lazy CAT on the BEAUTIFUL PLANE is currently SLEEPING HEAVILLY on the SOFA in front of the TV
the following tree... So that intermediary token都不是。
您要么忽略该令牌,要么不忽略,在这种情况下,您必须在解析器规则中将其设置为可选。
No.
You either ignore the token, or you don't, in which case you'll have to make it optional in your parser rule(s).