Antlr 词法分析器规则

发布于 2024-11-29 10:38:52 字数 624 浏览 4 评论 0原文

我有一个匹配字符串的规则，如下所示：

STRING
    : '"' ( ~( '"' | '\\' ) | '\\' . )* '"'
    ;

我不希望引号成为标记文本的一部分。在 Antlr2 中，我只是将 '!' 放在引号后面，告诉 Antlr 不要将它们添加到文本中。

请注意下面的'!'。

 STRING
    : '"'! ( ~( '"' | '\\' ) | '\\' . )* '"'!
    ;

但是在 Antlr3 中，我无法再执行此操作，因为我收到错误：

warning(149): Crv__.g:0:0: rewrite syntax or operator with no output option; setting output=AST

我不知道是否可以在此处使用重写规则，因为我不知道如何编写匹配所有标记“。”

我唯一的另一个想法是获取匹配的文本并返回不带引号的文本，但我不确定如何执行此操作，因为令牌尚未创建。

我正在使用 C Antlr 运行时。我怎样才能做到这一点？

原文

I've got a rule to match a string that looks like so:

STRING
    : '"' ( ~( '"' | '\\' ) | '\\' . )* '"'
    ;

I dont want the quotes to be part of the tokens text. In Antlr2 I would just put '!' after the quotes to tell Antlr not to add them to the text.

Notice the '!' below.

 STRING
    : '"'! ( ~( '"' | '\\' ) | '\\' . )* '"'!
    ;

However in Antlr3 I can no longer do this as I get the error:

warning(149): Crv__.g:0:0: rewrite syntax or operator with no output option; setting output=AST

I don't know if I can use a rewrite rule here as I don't know how to write the match everything token '.'

My only other thought is to grab the matched text and return it without the quotes, but I'm not sure how to do that as the token hasn't been created yet.

I'm using the C Antlr runtime.
How can I accomplish this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

猫七 2024-12-06 10:38:52

对于后代，我将提及我最终如何解决这个问题。

我使用了 @after 块来去掉引号

STRING
@after
{
    SETTEXT(GETTEXT()->substring(GETTEXT(),1,GETTEXT()->len-1))
}
: '"' ( ~( '"' | '\\' ) | '\\' . )* '"'
;

For posterity I'll mention how I ended up solving this.

I used an @after block to strip the quotes

STRING
@after
{
    SETTEXT(GETTEXT()->substring(GETTEXT(),1,GETTEXT()->len-1))
}
: '"' ( ~( '"' | '\\' ) | '\\' . )* '"'
;

回复收藏 0 原文

对你再特殊 2024-12-06 10:38:52

这是我最终使用的解决方案：

STRING          :       '"'         { \$s = ""; }
                (   '"' '"'         { \$s .= '"';}
                |   c=CHAR          { \$s .= \$c->gettext();}
                |   ' '             { \$s .= ' ';}
                )*
                '"'                 { \$this->setText(\$s); }
    ;



fragment CHAR       :   (ACCENT|SPECIAL|ALPHA|DIGIT);
fragment ACCENT     :   '\u00C0'..'\u00D6' | '\u00D9'..'\u00DD' | '\u00E0'..'\u00F6' |'\u00F9'..'\u00FD';
fragment SPECIAL    :   '.' | '!' | '-'| '?';
fragment ALPHA      :   'a'..'z' | 'A'..'Z';
fragment DIGIT      :   '0'..'9' ;

有一个细微的差别，即出于安全原因我有一个字符白名单。

但主要的区别是我增量地构建结果字符串，抛出“字符。

我使用的是 PHP 语言，这就是为什么有 \$
你知道哪一个更快吗？

This is the solution I ended up using :

STRING          :       '"'         { \$s = ""; }
                (   '"' '"'         { \$s .= '"';}
                |   c=CHAR          { \$s .= \$c->gettext();}
                |   ' '             { \$s .= ' ';}
                )*
                '"'                 { \$this->setText(\$s); }
    ;



fragment CHAR       :   (ACCENT|SPECIAL|ALPHA|DIGIT);
fragment ACCENT     :   '\u00C0'..'\u00D6' | '\u00D9'..'\u00DD' | '\u00E0'..'\u00F6' |'\u00F9'..'\u00FD';
fragment SPECIAL    :   '.' | '!' | '-'| '?';
fragment ALPHA      :   'a'..'z' | 'A'..'Z';
fragment DIGIT      :   '0'..'9' ;

There is one minor difference that is I have a white list of character for security reasons.

But the major difference is that I build the result string incrementally, tossing the " char.

I'm in PHP language, that's why there are \$
Do you know which one is faster ?

回复收藏 0 原文

~没有更多了~