噪声数据流上的 ANTLR 第 3 部分

发布于 2024-10-05 10:21:29 字数 1493 浏览 0 评论 0原文

仍在学习 ANTLR 的过程中......最近我发布了 2 个关于解析一些文本和提取信息的问题,留下“不需要的”单词或字符。与 Bart Kiers 进行了一次非常有趣的讨论解析嘈杂的数据流第 1 部分 并解析嘈杂的数据流第2部分,我要结束了还有一个问题...

最初,我的语法看起来像这样

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD'; 
INDIRECT_OBJECT : 'CAR'| 'SOFA';
ANY2            :'A'..'Z'+ {skip();};
ANY             : . {skip();};

parse 
  :  sentenceParts+ EOF 
  ;

sentenceParts  
  :  SUBJECT VERB INDIRECT_OBJECT  
  ;    

一个句子 it's 10PM and the Lazy CAT is current is Sleeping Deep on the SOFA in the front of TV. 将生成以下

alt text

这很好......它满足了我的要求,即仅提取单词 CATSLEEPINGSOFA,抛开其他词不谈。现在,由于另一个原因,我需要在语法中引入一个新标记,我们将其称为 OTHER : 'PLANE'。稍后它将被另一个规则使用。我仍然希望我的主要规则起作用:SUBJECT VERB INDIRECT_OBJECT。假设标记 'PLANE' 出现在我的句子中,例如

现在是晚上 10 点,飞机上的懒猫目前正在电视机前的沙发上沉沉地睡着. 它将产生以下错误(这并不奇怪,因为词法分析器对“PLANE”有明确的定义作为标记)

替代文字



有没有办法告诉 ANTLR,如果我输入规则 sentenceParts,我只关心我定义的 3 个标记,即 SUBJECTVERB 或 INDIRECT_OBJECT 并且,即使遇到不同的令牌,也不考虑它?我希望能够做到这一点,而无需在此规则中的任何地方放置 OTHER?

Still in the process of learning ANTLR... Recently I have been posting 2 questions regarding parsing some text and extracting information leaving aside "unwanted" words or character. Following a very interesing discussion with Bart Kiers on parsing a noisy datastream Part 1 and and parsing a noisy datastream Part 2, I'm ending up with one more problem...

Originally, my grammar looks like this

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD'; 
INDIRECT_OBJECT : 'CAR'| 'SOFA';
ANY2            :'A'..'Z'+ {skip();};
ANY             : . {skip();};

parse 
  :  sentenceParts+ EOF 
  ;

sentenceParts  
  :  SUBJECT VERB INDIRECT_OBJECT  
  ;    

a sentence like it's 10PM and the Lazy CAT is currently SLEEPING heavily on the SOFA in front of the TV. will produce the following

alt text

This is good... and it does what I want, i.e. extracting only the word CAT, SLEEPING and SOFA, leaving aside other words. Now, for another reason, I need to introduce a new token in my grammar, let's call it OTHER : 'PLANE'. It will be used later by another rule. I still want my primary rule to work : SUBJECT VERB INDIRECT_OBJECT. Let's say the token 'PLANE' appears in my sentence, like

it's 10PM and the Lazy CAT on the PLANE is currently SLEEPING heavily on the SOFA in front of the TV. It will produce the following error (no surprise here as the lexer has a clear definition of 'PLANE' as a token)

alt text

Is there a way to tell ANTLR that if I'm entering the rule sentenceParts I only care about the 3 tokens I have defined, namely SUBJECT, VERB or INDIRECT_OBJECT and that, even if it comes across a different token, not to take it into account ? I would like to be able to do that without putting OTHER? everywhere in this rule

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

心清如水 2024-10-12 10:21:29

好吧,事实上,我可能已经找到了一种方法来做到这一点......尽管如果您不想解析令牌,那么在这一点上引入令牌是有问题的,但这个解决方案是有效的:

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD'; 
INDIRECT_OBJECT : 'CAR'| 'SOFA';
OTHER       : 'PLANE';
OTHER2      : 'BEAUTIFUL';
OTHER3      : 'HEAVILLY';
ANY2            :'A'..'Z'+ {skip();};
ANY             : . {skip();};

解析 : 句子部分+ EOF ;

下一个 : ( 选项 {greedy=false;}: .)*;

句子部分
: 主语下一个动词下一个INDIRECT_OBJECT
;

this will produce on the following sentence it's 10PM and the Lazy CAT on the BEAUTIFUL PLANE is currently SLEEPING HEAVILLY on the SOFA in front of the TV the following tree... So that intermediary token

alt text

Well in fact, I might have found a way to do it... Although it's questionable at that point to introduce tokens if you don't want to parse them, this solution works :

VERB            : 'SLEEPING' | 'WALKING';
SUBJECT         : 'CAT'|'DOG'|'BIRD'; 
INDIRECT_OBJECT : 'CAR'| 'SOFA';
OTHER       : 'PLANE';
OTHER2      : 'BEAUTIFUL';
OTHER3      : 'HEAVILLY';
ANY2            :'A'..'Z'+ {skip();};
ANY             : . {skip();};

parse : sentenceParts+ EOF ;

next : ( options {greedy=false;}: .)*;

sentenceParts
: SUBJECT next VERB next INDIRECT_OBJECT
;

this will produce on the following sentence it's 10PM and the Lazy CAT on the BEAUTIFUL PLANE is currently SLEEPING HEAVILLY on the SOFA in front of the TV the following tree... So that intermediary token

alt text

还给你自由 2024-10-12 10:21:29

有没有办法告诉ANTLR,如果我输入规则sentenceParts,我只关心我定义的3个标记,即SUBJECT、VERB或INDIRECT_OBJECT,并且即使遇到不同的标记,也不关心考虑到这一点吗?我希望能够做到这一点而不需要放置其他?这条规则中的所有地方

都不是。

您要么忽略该令牌,要么不忽略,在这种情况下,您必须在解析器规则中将其设置为可选。

Is there a way to tell ANTLR that if I'm entering the rule sentenceParts I only care about the 3 tokens I have defined, namely SUBJECT, VERB or INDIRECT_OBJECT and that, even if it comes across a different token, not to take it into account ? I would like to be able to do that without putting OTHER? everywhere in this rule

No.

You either ignore the token, or you don't, in which case you'll have to make it optional in your parser rule(s).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文