忽略标记字符中的标记？

发布于 2024-11-07 21:27:53 字数 436 浏览 10 评论 0原文

我的词法分析器中有以下标记定义，定义了一个字符字符串（例如“abcd”）：

CharacterString:
  Apostrophe
  (Alphanumeric)*
  Apostrophe
;

是否可以忽略两个撇号，然后能够在词法分析器中没有它们的情况下获取标记字符串（通过 $CharacterString.text->字符）？

我尝试过……

CharacterString:
  Apostrophe { $channel = HIDDEN; }
  (Alphanumeric)*
  Apostrophe { $channel = HIDDEN; }
;

但没有成功……这种情况甚至不再匹配我的字符串（例如“oiu”将在解析器中失败 - 不匹配设置异常）。

谢谢：）

原文

I have the following token definition in my lexer defining a CharacterString (e.g. 'abcd'):

CharacterString:
  Apostrophe
  (Alphanumeric)*
  Apostrophe
;

Is it possible to ignore the two apostrophes to then be able to get the token string without them in the lexer (via $CharacterString.text->chars)?

I tried ...

CharacterString:
  Apostrophe { $channel = HIDDEN; }
  (Alphanumeric)*
  Apostrophe { $channel = HIDDEN; }
;

... without success... This case does not even match my string anymore (e.g. 'oiu' will fail in the parser - Missmatched Set Exception).

Thank you :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

微凉 2024-11-14 21:27:53

内联代码 {$channel=HIDDEN;} 会影响整个 CharacterString，因此您无法按照您尝试的方式进行操作。

您将需要添加一些自定义代码并自行删除引号。这是一个小型 C 演示：

grammar T;

options {
  language=C;
}

parse
  :  (t=. {printf(">\%s<\n", $t.text->chars);})+ EOF
  ;

CharacterString
  :  '\'' ~'\''* '\''
     {
       pANTLR3_STRING quoted = GETTEXT();
       SETTEXT(quoted->subString(quoted, 1, quoted->len-1));
     }
  ;

Any
  :  .
  ;

和一个小测试函数：

#include "TLexer.h"
#include "TParser.h"

int main(int argc, char *argv[])
{
  pANTLR3_UINT8 fName = (pANTLR3_UINT8)"input.txt";
  pANTLR3_INPUT_STREAM input = antlr3AsciiFileStreamNew(fName);

  if(input == NULL)
  {
    fprintf(stderr, "Failed to open file %s\n", (char *)fName);
    exit(1);
  }

  pTLexer lexer = TLexerNew(input);

  if(lexer == NULL)
  {
    fprintf(stderr, "Unable to create the lexer due to malloc() failure1\n");
    exit(1);
  }

  pANTLR3_COMMON_TOKEN_STREAM tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT, TOKENSOURCE(lexer));

  if(tstream == NULL)
  {
    fprintf(stderr, "Out of memory trying to allocate token stream\n");
    exit(1);
  }

  pTParser parser = TParserNew(tstream);

  if(parser == NULL)
  {
    fprintf(stderr, "Out of memory trying to allocate parser\n");
    exit(ANTLR3_ERR_NOMEM);
  }

  parser->parse(parser);

  parser->free(parser);   parser = NULL;
  tstream->free(tstream); tstream = NULL;
  lexer->free(lexer);     lexer = NULL;
  input->close(input);    input = NULL;

  return 0;
}

测试 input.txt 文件包含：

'abc'

如果您现在 1) 生成词法分析器和解析器，2) 编译所有 .c 源文件，以及 3) 运行 main：

# 1
java -cp antlr-3.3.jar org.antlr.Tool T.g

# 2
gcc -Wall main.c TLexer.c TParser.c -l antlr3c -o main

# 3
./main

您将看到 abc （不带引号）被打印到控制台。

The inline code {$channel=HIDDEN;} affects the entire CharacterString, so you can't do it like the way you tried.

You will need to add some custom code and remove the quotes yourself. Here's a small C demo:

grammar T;

options {
  language=C;
}

parse
  :  (t=. {printf(">\%s<\n", $t.text->chars);})+ EOF
  ;

CharacterString
  :  '\'' ~'\''* '\''
     {
       pANTLR3_STRING quoted = GETTEXT();
       SETTEXT(quoted->subString(quoted, 1, quoted->len-1));
     }
  ;

Any
  :  .
  ;

and a little test function:

#include "TLexer.h"
#include "TParser.h"

int main(int argc, char *argv[])
{
  pANTLR3_UINT8 fName = (pANTLR3_UINT8)"input.txt";
  pANTLR3_INPUT_STREAM input = antlr3AsciiFileStreamNew(fName);

  if(input == NULL)
  {
    fprintf(stderr, "Failed to open file %s\n", (char *)fName);
    exit(1);
  }

  pTLexer lexer = TLexerNew(input);

  if(lexer == NULL)
  {
    fprintf(stderr, "Unable to create the lexer due to malloc() failure1\n");
    exit(1);
  }

  pANTLR3_COMMON_TOKEN_STREAM tstream = antlr3CommonTokenStreamSourceNew(ANTLR3_SIZE_HINT, TOKENSOURCE(lexer));

  if(tstream == NULL)
  {
    fprintf(stderr, "Out of memory trying to allocate token stream\n");
    exit(1);
  }

  pTParser parser = TParserNew(tstream);

  if(parser == NULL)
  {
    fprintf(stderr, "Out of memory trying to allocate parser\n");
    exit(ANTLR3_ERR_NOMEM);
  }

  parser->parse(parser);

  parser->free(parser);   parser = NULL;
  tstream->free(tstream); tstream = NULL;
  lexer->free(lexer);     lexer = NULL;
  input->close(input);    input = NULL;

  return 0;
}

and the test input.txt file contains:

'abc'

If you now 1) generate the lexer and parser, 2) compile all .c source files, and 3) run main:

# 1
java -cp antlr-3.3.jar org.antlr.Tool T.g

# 2
gcc -Wall main.c TLexer.c TParser.c -l antlr3c -o main

# 3
./main

you'll see that abc (without the quotes) is being printed to the console.

回复收藏 0 原文

◇流星雨 2024-11-14 21:27:53

您可以通过词法分析器的 RecognizerSharedState state 属性影响令牌构建：

CharacterString:
  Apostrophe
  CharSequence
  Apostrophe
  { state.text = $CharSequence.text; }
;

fragment CharSequence:
  Alphanumeric+
;

You can influence token construction via RecognizerSharedState state attribute of your lexer:

CharacterString:
  Apostrophe
  CharSequence
  Apostrophe
  { state.text = $CharSequence.text; }
;

fragment CharSequence:
  Alphanumeric+
;

回复收藏 0 原文

~没有更多了~

关于作者

━╋う一瞬間旳綻放

暂无简介

文章

27 人气

关注发私信

十二

文章 0 评论 0

关注

飞烟轻若梦

文章 0 评论 0

关注

OPleyuhuo

文章 0 评论 0

关注

wxb0109

文章 0 评论 0

关注

旧城空念

文章 0 评论 0

关注

-小熊_

文章 0 评论 0

友情链接

文江博客

忽略标记字符中的标记？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

忽略标记字符中的标记？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。