为什么 flex/bison 中的多行注释如此回避?

发布于 2024-10-01 05:11:36 字数 1322 浏览 0 评论 0原文

我正在尝试解析我的 flex (.l) 文件中的 C 风格多行注释:

%s ML_COMMENT
%%

...

<INITIAL>"/*"                   BEGIN(ML_COMMENT);
<ML_COMMENT>"*/"                BEGIN(INITIAL);  
<ML_COMMENT>[.\n]+              { }

我没有返回任何标记,并且我的语法 (.y) 没有以任何方式处理注释。

当我运行可执行文件时,出现解析错误:(

$ ./a.out
/*
abc 
def
Parse error: parse error
$ echo "/* foo */" | ./a.out
Parse error: parse error

我的 yyerror 函数执行 printf("Parse error: %s\n"),这是冗余错误消息的前半部分的来源)。

我可以明白为什么第二个示例失败,因为整个输入都是注释,并且由于语法忽略了注释,因此没有语句。因此输入不是有效的程序。但在我完成评论之前,第一部分就抛出了解析错误。

同样令人困惑的是:

$ ./a.out
/* foo */
a = b;
Parse error: parse error

在这种情况下,注释在实际有效输入之前关闭(如果没有注释,则可以很好地解析)。失败实际上发生在解析“a”之后,而不是尝试解析赋值“a = b;”之后。如果我在自己的行中输入“a”,它仍然会抛出错误。

鉴于错误消息是解析器错误而不是扫描仪错误,我的 .y 文件中是否缺少一些重要的内容?或者我在传播到解析器端的扫描器规则中做错了什么?

编辑:根据@Rudi的建议,我打开调试并发现:

$ ./a.out
Starting parse
Entering state 0
Reading a token: /*
foo
Next token is 44 (IDENTIFER)
Shifting token 44 (IDENTIFER), Entering state 4
Reducing via rule 5 (line 130), IDENTIFER  -> identifier
state stack now 0
Entering state 5

我关闭调试并发现 /* foo */ = bar; 确实解析与 foo = bar;。我正在使用 Flex 2.5.4;它不会向我发出有关我尝试使用的状态规则的任何警告。

I'm trying to parse C-style multi-line comments in my flex (.l) file:

%s ML_COMMENT
%%

...

<INITIAL>"/*"                   BEGIN(ML_COMMENT);
<ML_COMMENT>"*/"                BEGIN(INITIAL);  
<ML_COMMENT>[.\n]+              { }

I'm not returning any token and my grammar (.y) doesn't address comments in any way.

When I run my executable, I get a parse error:

$ ./a.out
/*
abc 
def
Parse error: parse error
$ echo "/* foo */" | ./a.out
Parse error: parse error

(My yyerror function does a printf("Parse error: %s\n"), which is where the first half of the redundant error message comes from).

I can see why the second example fails since the entirety of the input is a comment, and since comments are ignored by the grammar, there are no statements. Thus the input isn't a valid program. But the first part throws a parse error before I even finish the comment.

Also confusing:

$ ./a.out
/* foo */
a = b;
Parse error: parse error

In this case, the comment is closed prior to actual valid input (which, without the comment, parses just fine). The failure actually occurs after parsing "a", not after attempting to parse the assignment "a = b;". If I enter "a" on its own line, it still throws an error.

Given that the error message is a parser error and not a scanner error, is there something crucial I'm missing in my .y file? Or am I doing something wrong in my scanner rules that propagates over to the parser side?

EDIT: Per @Rudi's suggestion, I turned on debugging and found:

$ ./a.out
Starting parse
Entering state 0
Reading a token: /*
foo
Next token is 44 (IDENTIFER)
Shifting token 44 (IDENTIFER), Entering state 4
Reducing via rule 5 (line 130), IDENTIFER  -> identifier
state stack now 0
Entering state 5

I turned off debugging and found that /* foo */ = bar; indeed parses the same as foo = bar;. I'm using flex 2.5.4; it doesn't give me any warnings about the stateful rules I'm attempting to use.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

还如梦归 2024-10-08 05:11:36

我认为您需要将 ML_COMMENT 开始条件声明为独占开始条件,以便只有 ML_COMMENT 规则处于活动状态。 %x ML_COMMENT 而不是 %s ML_COMMENT 否则,

没有启动条件的规则也会处于活动状态。

I think you need to declare your ML_COMMENT start condition as an exclusive start condition so only the ML_COMMENT rules are active. %x ML_COMMENT instead of %s ML_COMMENT

Otherwise rules with no start conditions are also active.

爱*していゐ 2024-10-08 05:11:36

以这种方式解析注释可能会导致错误,因为:

  • 您需要向所有 lex 规则添加条件,
  • 如果您还想处理 // 注释,那么它会变得更加复杂
  • ,您仍然面临 yacc/bison 合并两个注释(包括中的所有内容)的风险在

我的解析器中,我处理这样的注释。首先为注释的开头定义 lex 规则,如下所示:

\/\*     {
         if (!SkipComment())
            return(-1);
         }

\/\/     {
         if (!SkipLine())
            return(-1);
         }

然后编写 SkipComment 和 SkipLine 函数。他们需要消耗所有输入,直到找到注释末尾​​(这是相当旧的代码,所以请原谅我有些过时的结构):

bool SkipComment (void)
{
int Key;

Key=!EOF;
while (true)
   {
   if (Key==EOF)
      {
      /* yyerror("Unexpected EOF within comment."); */
      break;
      }
   switch ((char)Key)
      {
      case '*' :
         Key=input();
         if (char)Key=='/') return true;
         else               continue;
         break;
      case '\n' :
         ++LineNr;
         break;
      }
   Key=input();
   }

return false;
}

bool SkipLine (void)
{
int Key;

Key=!EOF;
while (true)
   {
   if (Key==EOF)
      return true;
   switch ((char)Key)
      {
      case '\n' :
         unput('\n');
         return true;
         break;
      }
   Key=input();
   }

return false;
}

Parsing comments this way can lead to errors because:

  • you need to add conditions to all of your lex rules
  • it becomes even more complex if you also want to handle // comments
  • you still have the risk that yacc/bison merges two comments including everything in between

In my parser, I handle comments like this. First define lex rules for the start of the comment, like this:

\/\*     {
         if (!SkipComment())
            return(-1);
         }

\/\/     {
         if (!SkipLine())
            return(-1);
         }

then write the SkipComment and SkipLine functions. They need to consume all the input until the end of the comment is found (this is rather old code so forgive me the somewhat archaic constructions):

bool SkipComment (void)
{
int Key;

Key=!EOF;
while (true)
   {
   if (Key==EOF)
      {
      /* yyerror("Unexpected EOF within comment."); */
      break;
      }
   switch ((char)Key)
      {
      case '*' :
         Key=input();
         if (char)Key=='/') return true;
         else               continue;
         break;
      case '\n' :
         ++LineNr;
         break;
      }
   Key=input();
   }

return false;
}

bool SkipLine (void)
{
int Key;

Key=!EOF;
while (true)
   {
   if (Key==EOF)
      return true;
   switch ((char)Key)
      {
      case '\n' :
         unput('\n');
         return true;
         break;
      }
   Key=input();
   }

return false;
}
内心激荡 2024-10-08 05:11:36

除了 %x%s 的问题之外,您还存在 [.\n]. 的问题code> 匹配(仅)文字 .,而不是像裸 . 那样的“除换行符以外的任何字符”。你想要一个类似的

<ML_COMMENT>.|"\n"     { /* do nothing */ }

规则

Besides the problem with %x vs %s, you also have the problem that the . in [.\n] matches (only) a literal . and not 'any character other than newline' like a bare . does. You want a rule like

<ML_COMMENT>.|"\n"     { /* do nothing */ }

instead

素年丶 2024-10-08 05:11:36

我发现这个 C 语言语法(实际上只是词法分析器)的描述非常有用。我认为这与帕特里克的答案基本相同,但略有不同。

http://www.lysator.liu.se/c/ANSI -C-grammar-l.html

I found this description of the C language grammar (actually just the lexer) very useful. I think it is mostly the same as Patrick's answer, but slightly different.

http://www.lysator.liu.se/c/ANSI-C-grammar-l.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文