yacc 输出帮助

发布于 2024-10-18 00:32:08 字数 1780 浏览 3 评论 0 原文

例如,如果我有这个语法

start : TKN id '{' '}' {cout<<$2<

  • 包含 iostream,TKN 声明为 token,id 类型声明为 char *

  • as输入 我输入 tkn aaa { }

输出不应该是 aaa 吗?有时,它打印 }0 有时它只是挂起

我想获取 id 的值,如何正确获取?

lex.l

%{
  #include "yacc.hpp"
  #include <math.h>
  #include<iostream>
  #include<string>
  int rows = 1,tmp=0;
%}
Id          [a-zA-Z_][0-9a-zA-Z_]*
%x String ...
%option c++
%option noyywrap
%% 
{Id}        {strcpy(yylval.strVal,yytext);cout<<"lex= "<<yytext<<endl;return Id;}//output line 1
...

yacc.y

%output ="yacc.cpp"
%defines
%verbose
%token Id
%token NAMESPACE
%union{

int           iVal;
float         fVal;
char*         strVal;
class Symbol* symPtr;
class NS*     nsPtr;
};


%token <iVal> INT_VAL;
%token <fVal> F_VAL;
%token <strVal> STR_VAL INT FLOAT STRING Id ;
%type <nsPtr> namespacedecl  
%type <symPtr> ns_closer
%type <strVal> Q_ID

//-------------------------------------------
namespacedecl  : NAMESPACE Q_ID '{' '}'            {cout<<"ns= "<<$2<< endl ;} // output line 3
               | NAMESPACE Q_ID '{' typedecl_closer '}' ;

 Q_ID :          Q_ID '.' Id                       {cout<<$3<< endl ;$$ = $3;}
               | Id                                {$$ = $1;cout<<"qid="<<$$<<endl;} // output line 2

当然,文件比这个大,但是复制/粘贴所有内容都会让你迷路 ^^

如果有办法附加文件,请告诉我,因为我还是这里的新手,这会容易得多而不是复制/粘贴。

这是我运行时得到的:

在此处输入图像描述

感谢您的回复

if I have this grammer for example

start : TKN id '{' '}' {cout<<$2<<endl;}
;

  • iostream is included , TKN is declaered as token, id type is declared as char *

  • as input I enter tkn aaa { }

shouldn't the output be aaa?? sometimes , it prints } and 0 and sometimes it just hangs

I want to get the value of id ,how to get it right ??

lex.l

%{
  #include "yacc.hpp"
  #include <math.h>
  #include<iostream>
  #include<string>
  int rows = 1,tmp=0;
%}
Id          [a-zA-Z_][0-9a-zA-Z_]*
%x String ...
%option c++
%option noyywrap
%% 
{Id}        {strcpy(yylval.strVal,yytext);cout<<"lex= "<<yytext<<endl;return Id;}//output line 1
...

yacc.y

%output ="yacc.cpp"
%defines
%verbose
%token Id
%token NAMESPACE
%union{

int           iVal;
float         fVal;
char*         strVal;
class Symbol* symPtr;
class NS*     nsPtr;
};


%token <iVal> INT_VAL;
%token <fVal> F_VAL;
%token <strVal> STR_VAL INT FLOAT STRING Id ;
%type <nsPtr> namespacedecl  
%type <symPtr> ns_closer
%type <strVal> Q_ID

//-------------------------------------------
namespacedecl  : NAMESPACE Q_ID '{' '}'            {cout<<"ns= "<<$2<< endl ;} // output line 3
               | NAMESPACE Q_ID '{' typedecl_closer '}' ;

 Q_ID :          Q_ID '.' Id                       {cout<<$3<< endl ;$ = $3;}
               | Id                                {$ = $1;cout<<"qid="<<$<<endl;} // output line 2

of course the files are bigger than this but copy/paste everything will make u get lost ^^

if there's a way to attach the file please tell me cuz I'm still newbie here and it would be much easier than copy/paste.

and this is the what I get when running :

enter image description here

thanks for responding

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

百思不得你姐 2024-10-25 00:32:08

这取决于“id”规则返回的内容!

%%
start :             TKN id '{' '}' {cout<<$2<<endl;} ;

id :                ID             { return "XXXXX"; } // What is returned here
                                                       // Is what will be printed out by $2 

%%

注:
惯例是终端标记全部大写 (TKN)。而非终结符是小写的(id)。根据这个约定,我希望 id 对于如何扩展有一个规则。

我怀疑你正在做的是:

%%

id :                ID             { return yytext; }

%%

这是一个指向 lex 缓冲区的指针。这是一个易失性缓冲区。您不能依赖其内容保持不变(也不能依赖其被“\0”终止)。您需要做的是在识别令牌时复制该令牌。

%%

id :                ID             { return strndup(yytext, yylen); }

%%

根据新输入进行编辑:

在这些行中:

{Id}        {   strcpy(yylval.strVal,yytext);
                cout<<"lex= "<<yytext<<endl;
                return Id;
            }

strcpy() 可能不好。

  • 您是否为副本预留了空间?
  • 您不能依赖 yytext 以“\0”终止。

cout 是危险的,因为您不能依赖 yytext 以 '\0' 终止。

我会这样做:

{ID}        {   cout << "lex=" << std::string(yytext,yytext+yylen); // use yylen
                return Id;
            }

不要乱搞 lex 中的 yacc 结构。它将您的 lex 文件与 yacc 紧密耦合(这不是不必要的)。只需返回令牌即可。然后 yacc 可以手动获取令牌值。

然后在 YACC 文件中:

Q_ID :        Q_ID '.' ident        {$ = $3; cout<<"Q.id="<<$<<endl;}
           |  ident                 {$ = $1; cout<<"ID  ="<<$<<endl;}

ident :       Id                    {$ = strndup(yytext, yylen);}

对于每个具有长令牌的终端(Id),都有一个非终端用于解码终端并在 yacc 联合结构中生成正确的值。在这种情况下,我们的身份不是终端。它只是解码 Id 终端并正确设置令牌(此 Id 在联合结构中没有类型)。

另请注意:

  • 传统上终端全部大写。身份证失败。
  • 传统上非终结符是小写的。 Q_ID 失败。

另外:

这行:

%token <strVal> STR_VAL INT FLOAT STRING Id ;

看起来错误,但在不知道 INT、FLOAT 和 STRING 是什么的情况下很难分辨。我猜这些是关键字 int、float、string 的终端标记。在这种情况下,您不需要存储实际的令牌字符串。事实上,您知道它是 INT/FLOAT 或 STRING 就足够了。

这应该是:

%token <strVal> STR_VAL;
%token <strVal> ident;

It depends on what the 'id' rule returned!

%%
start :             TKN id '{' '}' {cout<<$2<<endl;} ;

id :                ID             { return "XXXXX"; } // What is returned here
                                                       // Is what will be printed out by $2 

%%

Note:
The convention is that terminal tokens are all-caps (TKN). While non-terminal tokens are lowercase (id). By this convention I would expect id to have a rule on how it is expanded.

What I suspect you are doing is:

%%

id :                ID             { return yytext; }

%%

Which is a pointer into the lex buffer. This is a volatile buffer. You can NOT rely on its content remaining unchanged (nor can you rely on it being '\0' terminated). What you need to do is make a copy of the token at the point you identify it.

%%

id :                ID             { return strndup(yytext, yylen); }

%%

Edit based on new input:

In these lines:

{Id}        {   strcpy(yylval.strVal,yytext);
                cout<<"lex= "<<yytext<<endl;
                return Id;
            }

The strcpy() is probably bad.

  • Have you reserved any space for the copy?
  • You can not rely on yytext being '\0' terminated.

The cout is dangerious as you can not rely on yytext being '\0' terminated.

I would do:

{ID}        {   cout << "lex=" << std::string(yytext,yytext+yylen); // use yylen
                return Id;
            }

Do not mess around with yacc structures in lex. It tightly couples your lex file to yacc (which is not not necessary). Just return the token. yacc can then get the token value manually.

Then in the YACC File:

Q_ID :        Q_ID '.' ident        {$ = $3; cout<<"Q.id="<<$<<endl;}
           |  ident                 {$ = $1; cout<<"ID  ="<<$<<endl;}

ident :       Id                    {$ = strndup(yytext, yylen);}

For every terminal (Id) that has a long token have a non terminal for de-coding the terminal and generate the correct value in the yacc union structure. In this case we have the ident not terminal. It just decodes the Id terminal and gets the token set up correctly (this Id does not have a type in the union structure).

Also Note:

  • It is traditional for terminals to be all caps. Id fails.
  • It is traditional for non-terminals to be lowercase. Q_ID fails.

Additionally:

This line:

%token <strVal> STR_VAL INT FLOAT STRING Id ;

Looks wrong, though it is hard to tell without knowing what INT, FLOAT and STRING are. I am guessing these are the terminal tokens for the keywords int,float,string. In which case you do not need to store the actual token string. The fact that you know it is INT/FLOAT or STRING is enough.

This should be:

%token <strVal> STR_VAL;
%token <strVal> ident;
美胚控场 2024-10-25 00:32:08

这在很大程度上取决于您的词法分析器为 id 标记返回的 yylval,因为这是在标记移动时复制到解析器堆栈上的内容,因此 $2 所指的内容到评估规则的时间。据猜测,您

[a-zA-Z_][a-zA-Z_0-9]*      { yylval.str = yytext; return id; }

有一个类似以下的 lex 规则: with

%token<str>    id

的解析器中 。在这种情况下,您将一个指针存储到 Flex 的内部扫描器缓冲区中,该缓冲区当时包含“aaa”,但会被稍后的标记覆盖,因此当操作运行时,$2 中的指针指向其他内容。您需要将字符串复制到不会被覆盖的地方,并将 yylval 设置为指向该位置。您可以使用 strdup(3) 将字符串复制到分配的缓冲区,这可以解决此问题,但可能会导致内存泄漏。

编辑

好吧,根据您的附加信息,您的程序没有崩溃是非常令人惊讶的——您将令牌文本strcpyyylval.strVal中,但您从未将 yylval.Strval 初始化为指向任何位置,因此您将其复制到某个随机内存位置。您需要将 yylval.strVal = malloc(strlen(yytext)+1); 之类的内容粘贴到 lex.l 操作中,以确保它指向有效的内存,或者只使用更简单且等效的调用strdup,因为它结合了 malloc 和 strcpy:

[a-zA-Z_][a-zA-Z_0-9]*      { yylval.str = strdup(yytext); return id; }

This depends heavily on what your lexer returns as yylval for the id token, as that is what get copied onto the parser stack when the token is shifted, and thus what $2 refers to when the rule is evaluated. At a guess, you have a lex rule something like:

[a-zA-Z_][a-zA-Z_0-9]*      { yylval.str = yytext; return id; }

with

%token<str>    id

in your parser. In this case, you're storing a pointer into flex's internal scanner buffer which contains 'aaa' at that instant, but will be overwritten by a later token, so by the time the action runs, the pointer in $2 points at something else. You need to copy the string somewhere where it won't be overwritten and set yylval to point at that. You could use strdup(3) to copy the string to a malloced buffer, which would solve this problem but might leave you with memory leaks.

edit

Well with your additional info, its quite surprising that your program doesn't crash -- you strcpy the token text into yylval.strVal, but you never initialize yylval.Strval to point anywhere, so you're copying it into some random memory location. You need to stick something like yylval.strVal = malloc(strlen(yytext)+1); into the lex.l action to ensure it points at valid memory, or just use the simpler and equivalent call to strdup, as that combines the malloc and strcpy:

[a-zA-Z_][a-zA-Z_0-9]*      { yylval.str = strdup(yytext); return id; }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文