Bison:如果令牌不符合规则,如何忽略它

发布于 2024-11-25 18:20:26 字数 352 浏览 1 评论 0原文

我正在编写一个程序来处理评论以及其他一些事情。如果评论位于特定位置,那么我的程序就会执行某些操作。

Flex 在找到评论时会传递一个令牌,然后 Bison 会查看该令牌是否符合特定规则。如果是,则它将采取与该规则相关的操作。

事情是这样的:我收到的输入实际上可能在错误的地方有评论。在这种情况下,我只想忽略评论而不是标记错误

我的问题:
如果令牌符合规则,我如何使用它,但如果不符合规则则忽略它?我可以将标记设置为“可选”吗?

(注意:我现在能想到的唯一方法是将评论标记分散在每个可能的规则中的每个可能的位置。必须有比这更好的解决方案也许有一些涉及根的规则?)

I'm writing a program that handles comments as well as a few other things. If a comment is in a specific place, then my program does something.

Flex passes a token upon finding a comment, and Bison then looks to see if that token fits into a particular rule. If it does, then it takes an action associated with that rule.

Here's the thing: the input I'm receiving might actually have comments in the wrong places. In this case, I just want to ignore the comment rather than flagging an error.

My question:
How can I use a token if it fits into a rule, but ignore it if it doesn't? Can I make a token "optional"?

(Note: The only way I can think of of doing this right now is scattering the comment token in every possible place in every possible rule. There MUST be a better solution than this. Maybe some rule involving the root?)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

征﹌骨岁月お 2024-12-02 18:20:26

一种解决方案可能是使用 bison 的错误恢复(请参阅 Bison 手册< /a>)。

总而言之,bison 定义了终端标记 error 来表示错误(例如,在错误位置返回的注释标记)。这样,您可以(例如)在找到任性的注释后关闭括号或大括号。然而,这种方法可能会丢弃一定量的解析,因为我不认为 bison 可以“撤消”减少。 (“标记”错误,就像将消息打印到 stderr 一样,与此无关:您可以错误而不打印错误 - 这取决于如何您定义yyerror。)

您可能希望将每个终端包装在一个特殊的非终端中:

term_wrap: comment TERM

这有效地完成了您害怕做的事情(在每个规则中添加注释),但它确实做到了在更少的地方。

为了强迫自己吃自己的狗粮,我为自己编造了一种愚蠢的语言。唯一的语法是 print; please,但如果数字和 please 之间(至少)有一个注释 (##),则会以十六进制打印数字。

像这样:

print 1 please
1
## print 2 please
2
print ## 3 please
3
print 4 ## please
0x4
print 5 ## ## please
0x5
print 6 please ##
6

我的词法分析器:

%{
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
%}

%%

print           return PRINT;
[[:digit:]]+    yylval = atoi(yytext); return NUMBER;
please          return PLEASE;
##              return COMMENT;

[[:space:]]+    /* ignore */
.               /* ditto */

和解析器:

%debug
%error-verbose
%verbose
%locations

%{
#include <stdio.h>
#include <string.h>

void yyerror(const char *str) {
        fprintf(stderr, "error: %s\n", str);
}

int yywrap() {
        return 1;
} 

extern int yydebug;
int main(void) {
    yydebug = 0;
    yyparse();
}
%}

%token PRINT NUMBER COMMENT PLEASE

%%

commands: /* empty */
        |
        commands command
    ;

command: print number comment please {
        if ($3) {
            printf("%#x", $2);
        } else {
            printf("%d", $2);
        }
        printf("\n");
     }
     ;

print: comment PRINT
     ;

number: comment NUMBER {
        $ = $2;
      }
      ;

please: comment PLEASE
      ;

comment: /* empty */ {
            $ = 0;
       }
       |
        comment COMMENT {
            $ = 1;
        }
    ;

所以,正如你所看到的,这并不完全是火箭科学,但它确实有效。由于空字符串在多个位置与 comment 匹配,因此存在移位/归约冲突。此外,没有规则可以在最后的 pleaseEOF 之间添加注释。但总的来说,我认为这是一个很好的例子。

One solution may be to use bison's error recovery (see the Bison manual).

To summarize, bison defines the terminal token error to represent an error (say, a comment token returned in the wrong place). That way, you can (for example) close parentheses or braces after the wayward comment is found. However, this method will probably discard a certain amount of parsing, because I don't think bison can "undo" reductions. ("Flagging" the error, as with printing a message to stderr, is not related to this: you can have an error without printing an error--it depends on how you define yyerror.)

You may instead want to wrap each terminal in a special nonterminal:

term_wrap: comment TERM

This effectively does what you're scared to do (put in a comment in every single rule), but it does it in fewer places.

To force myself to eat my own dog food, I made up a silly language for myself. The only syntax is print <number> please, but if there's (at least) one comment (##) between the number and the please, it prints the number in hexadecimal, instead.

Like this:

print 1 please
1
## print 2 please
2
print ## 3 please
3
print 4 ## please
0x4
print 5 ## ## please
0x5
print 6 please ##
6

My lexer:

%{
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
%}

%%

print           return PRINT;
[[:digit:]]+    yylval = atoi(yytext); return NUMBER;
please          return PLEASE;
##              return COMMENT;

[[:space:]]+    /* ignore */
.               /* ditto */

and the parser:

%debug
%error-verbose
%verbose
%locations

%{
#include <stdio.h>
#include <string.h>

void yyerror(const char *str) {
        fprintf(stderr, "error: %s\n", str);
}

int yywrap() {
        return 1;
} 

extern int yydebug;
int main(void) {
    yydebug = 0;
    yyparse();
}
%}

%token PRINT NUMBER COMMENT PLEASE

%%

commands: /* empty */
        |
        commands command
    ;

command: print number comment please {
        if ($3) {
            printf("%#x", $2);
        } else {
            printf("%d", $2);
        }
        printf("\n");
     }
     ;

print: comment PRINT
     ;

number: comment NUMBER {
        $ = $2;
      }
      ;

please: comment PLEASE
      ;

comment: /* empty */ {
            $ = 0;
       }
       |
        comment COMMENT {
            $ = 1;
        }
    ;

So, as you can see, not exactly rocket science, but it does the trick. There's a shift/reduce conflict in there, because of the empty string matching comment in multiple places. Also, there's no rule to fit comments in between the final please and EOF. But overall, I think it's a good example.

晚雾 2024-12-02 18:20:26

在词法分析器级别将注释视为空格。
但保留两条单独的规则,一条用于空格,一条用于注释,两者都返回相同的令牌 ID。

  • 注释规则(+可选空格)在专用结构中跟踪注释。
  • 空白规则会重置结构。

当您输入该“特定位置”时,请查看最后一个空格是否是注释或触发错误。

Treat comments as whitespace at the lexer level.
But keep two separate rules, one for whitespace and one for comments, both returning the same token ID.

  • The rule for comments (+ optional whitespace) keeps track of the comment in a dedicated structure.
  • The rule for whitespace resets the structure.

When you enter that “specific place”, look if the last whitespace was a comment or trigger an error.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文