Bison/Yacc 语法中的无意串联

发布于 2024-08-30 07:36:37 字数 1551 浏览 8 评论 0原文

我正在尝试 lex 和 yacc 并遇到了一个奇怪的问题,但我认为最好在详细说明问题之前向您展示我的代码。这是我的词法分析器:

%{
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
%}

%%

[a-zA-Z]+ {
  yylval.strV = yytext;
  return ID;
}

[0-9]+      {
  yylval.intV = atoi(yytext);
  return INTEGER;
}

[\n] { return *yytext; }

[ \t]        ;

. yyerror("invalid character");

%%

int yywrap(void) {
  return 1;
}

这是我的解析器:

%{
#include <stdio.h>

int yydebug=1;
void prompt();
void yyerror(char *);
int yylex(void);
%}

%union {
  int intV;
  char *strV;
}

%token INTEGER ID

%%

program: program statement EOF { prompt(); }
       | program EOF { prompt(); }
       | { prompt(); }
       ;

args: /* empty */
    | args ID { printf(":%s ", $<strV>2); }
    ;

statement: ID args { printf("%s", $<strV>1); }
         | INTEGER { printf("%d", $<intV>1); }
;

EOF: '\n'

%%

void yyerror(char *s) {
  fprintf(stderr, "%s\n", s);
}

void prompt() {
  printf("> ");
}

int main(void) {
  yyparse();
  return 0;
}

一种非常简单的语言,仅由字符串、整数和基本 REPL 组成。现在,您将在解析器中注意到 args 输出时带有前导冒号,其目的是,当与语句的规则的第一个模式组合时,与 REPL 的交互看起来像这样:

> aaa aa a
:aa :a aaa>

但是,交互是这样的:

> aaa aa a
:aa :a aaa aa aa
>

为什么以下规则中的令牌 ID

statement: ID args { printf("%s", $<strV>1); }
         | INTEGER { printf("%d", $<intV>1); }
;

具有总输入字符串(包括换行符)的语义值?如何修改我的语法以实现我想要的交互?

I am experimenting with lex and yacc and have run into a strange issue, but I think it would be best to show you my code before detailing the issue. This is my lexer:

%{
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
void yyerror(char *);
%}

%%

[a-zA-Z]+ {
  yylval.strV = yytext;
  return ID;
}

[0-9]+      {
  yylval.intV = atoi(yytext);
  return INTEGER;
}

[\n] { return *yytext; }

[ \t]        ;

. yyerror("invalid character");

%%

int yywrap(void) {
  return 1;
}

This is my parser:

%{
#include <stdio.h>

int yydebug=1;
void prompt();
void yyerror(char *);
int yylex(void);
%}

%union {
  int intV;
  char *strV;
}

%token INTEGER ID

%%

program: program statement EOF { prompt(); }
       | program EOF { prompt(); }
       | { prompt(); }
       ;

args: /* empty */
    | args ID { printf(":%s ", 
lt;strV>2); }
    ;

statement: ID args { printf("%s", 
lt;strV>1); }
         | INTEGER { printf("%d", 
lt;intV>1); }
;

EOF: '\n'

%%

void yyerror(char *s) {
  fprintf(stderr, "%s\n", s);
}

void prompt() {
  printf("> ");
}

int main(void) {
  yyparse();
  return 0;
}

A very simple language, consisting of no more than strings and integer and a basic REPL. Now, you'll note in the parser that args are output with a leading colon, the intention being that, when combined with the first pattern of the rule of the statement the interaction with the REPL would look something like this:

> aaa aa a
:aa :a aaa>

However, the interaction is this:

> aaa aa a
:aa :a aaa aa aa
>

Why does the token ID in the following rule

statement: ID args { printf("%s", 
lt;strV>1); }
         | INTEGER { printf("%d", 
lt;intV>1); }
;

have the semantic value of the total input string, newline included? How can my grammar be reworked so that the interaction I intended?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

似梦非梦 2024-09-06 07:36:37

如果您希望令牌字符串保持有效,则必须在读取令牌字符串时保留它们。我将 statement 规则修改为:

statement: ID { printf("<%s> ", 
lt;strV>1); } args { printf("%s", 
lt;strV>1); }
         | INTEGER { printf("%d", 
lt;intV>1); }
;

然后,根据您的输入,我得到输出:

> aaa aa a
<aaa> :aa :a aaa aa a
>

请注意,在读取初始 ID 时,令牌正是您所期望的。但是,由于您没有保留令牌,因此当您在解析 args 后返回打印字符串时,该字符串已被修改。

You have to preserve token strings as they are read if you want them to remain valid. I modified the statement rule to read:

statement: ID { printf("<%s> ", 
lt;strV>1); } args { printf("%s", 
lt;strV>1); }
         | INTEGER { printf("%d", 
lt;intV>1); }
;

Then, with your input, I get the output:

> aaa aa a
<aaa> :aa :a aaa aa a
>

Note that at the time the initial ID is read, the token is exactly what you expected. But, because you did not preserve the token, the string has been modified by the time you get back to printing it after the args have been parsed.

趁年轻赶紧闹 2024-09-06 07:36:37

我认为参数和语句产生之间存在关联性冲突。 bison -v parser.output 文件的(部分)输出证实了这一点:

Nonterminals, with rules where they appear

$accept (6)
    on left: 0
program (7)
    on left: 1 2 3, on right: 0 1 2
statement (8)
    on left: 4 5, on right: 1
args (9)
    on left: 6 7, on right: 4 7
EOF (10)
    on left: 8, on right: 1 2

确实,我很难弄清楚您的语法试图接受什么。作为旁注,我可能会将您的 EOF 生成作为 EOL 令牌移至词法分析器中;这将使重新同步解析错误变得更加容易。

更好地解释您的意图将会有所帮助。

I think there is an associativity conflict between the args and statement productions. This is borne out by the (partial) output from the bison -v parser.output file:

Nonterminals, with rules where they appear

$accept (6)
    on left: 0
program (7)
    on left: 1 2 3, on right: 0 1 2
statement (8)
    on left: 4 5, on right: 1
args (9)
    on left: 6 7, on right: 4 7
EOF (10)
    on left: 8, on right: 1 2

Indeed, I'm having a hard time trying to figure out what your grammar is trying to accept. As a side note, I'd probably move your EOF production into the lexer as an EOL token; this will make resynchronizing on parse errors easier.

Better explanation of your intent would be helpful.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文