带引号的字符串的 Flex 操作返回空字符串

发布于 2025-01-14 11:44:42 字数 2960 浏览 2 评论 0原文

我正在尝试使用 Flex 手册 [1] 中显示的示例。该示例显示了可能包含八进制代码的带引号字符串的 Flex 规则。

该手册对收盘报价操作的描述有点不完整。它只有这样的评论:

/* return string constant token type and
*  value to parser
*/

所以我创建了我认为可以工作的代码,但显然我的代码不正确。

下面是词法分析器和解析器。当我执行生成的解析器时,我得到以下输出:

The string is: ''

我期望和想要的是这个输出:

The string is: 'John Doe'

我的输入是这样的: "John Doe"

请问我做错了什么?

这是词法分析器:

%option noyywrap
%x STR
%{
#include "parse.tab.h"
#define MAX_STR_CONST 100
%}
%% 
    char string_buf[MAX_STR_CONST];
    char *string_buf_ptr;
    
\"            { string_buf_ptr = string_buf; BEGIN(STR); }
                
                
<STR>{
    \"          { /* closing quote - all done */
                   BEGIN(INITIAL);
                   *string_buf_ptr = '\0';
                   yylval.strval = strdup(string_buf_ptr);
                   return(STRING); 
                }
                
    \n          {  /* error - unterminated string constant */
                   perror("Error - unterminated string");
                   yyterminate();
                }
                
    \\[0-7]{1,3} { /* octal escape sequence */
                   int result;
                   (void) sscanf(yytext+1, "%o", &result);
                   if (result > 0xff) {
                      perror("Error - octal escape is out-of-bounds");
                      yyterminate();
                   }
                   *string_buf_ptr++ = result;
                 }
               
    \\[0-9]+    { /* bad escape sequence */
                   perror("Error - bad escape sequence");
                   yyterminate();
                }
                
    \\n         *string_buf_ptr++ = '\n';
    \\t         *string_buf_ptr++ = '\t';
    \\r         *string_buf_ptr++ = '\r';
    \\b         *string_buf_ptr++ = '\b';
    \\f         *string_buf_ptr++ = '\f';
    
    \\(.|\n)    *string_buf_ptr++ = yytext[1];
    
    [^\\\n\"]+  {
                   char *yptr = yytext; 
                   while (*yptr)
                      *string_buf_ptr++ = *yptr++;
                }
}
%%

这是解析器:

%{
#include <stdio.h>
#include <stdlib.h>
/* interface to the lexer */
extern int yylineno; /* from lexer */
int yylex(void);
void yyerror(const char *s, ...);
extern FILE *yyin;
int yyparse (void);
%}
%union {
   char *strval;
}
%token <strval> STRING
%%
start 
    : STRING       { printf("The string is: '%s'", $1);}
;
%%

int main(int argc, char *argv[])
{
    yyin = fopen(argv[1], "r");
    
    yyparse();
    
    fclose(yyin);
    
    return 0;
}

void yyerror(const char *s, ...)
{
  fprintf(stderr, "%d: %s\n", yylineno, s);
}

[1] 请参阅 Flex 手册中的第 24-25 页 https://epaperpress.com/lexandyacc/download/flex.pdf

I am trying to get working an example shown in the Flex manual [1]. The example shows Flex rules for a quoted string that may contain octal codes.

The manual is a bit incomplete in its description of the action for the closing quote. It simply has this comment:

/* return string constant token type and
*  value to parser
*/

So I created code that I thought would work, but apparently my code is incorrect.

Below is the lexer followed by the parser. When I execute the generated parser, I get this output:

The string is: ''

What I expect, and want, is this output:

The string is: 'John Doe'

My input is this: "John Doe"

What am I doing wrong, please?

Here is the lexer:

%option noyywrap
%x STR
%{
#include "parse.tab.h"
#define MAX_STR_CONST 100
%}
%% 
    char string_buf[MAX_STR_CONST];
    char *string_buf_ptr;
    
\"            { string_buf_ptr = string_buf; BEGIN(STR); }
                
                
<STR>{
    \"          { /* closing quote - all done */
                   BEGIN(INITIAL);
                   *string_buf_ptr = '\0';
                   yylval.strval = strdup(string_buf_ptr);
                   return(STRING); 
                }
                
    \n          {  /* error - unterminated string constant */
                   perror("Error - unterminated string");
                   yyterminate();
                }
                
    \\[0-7]{1,3} { /* octal escape sequence */
                   int result;
                   (void) sscanf(yytext+1, "%o", &result);
                   if (result > 0xff) {
                      perror("Error - octal escape is out-of-bounds");
                      yyterminate();
                   }
                   *string_buf_ptr++ = result;
                 }
               
    \\[0-9]+    { /* bad escape sequence */
                   perror("Error - bad escape sequence");
                   yyterminate();
                }
                
    \\n         *string_buf_ptr++ = '\n';
    \\t         *string_buf_ptr++ = '\t';
    \\r         *string_buf_ptr++ = '\r';
    \\b         *string_buf_ptr++ = '\b';
    \\f         *string_buf_ptr++ = '\f';
    
    \\(.|\n)    *string_buf_ptr++ = yytext[1];
    
    [^\\\n\"]+  {
                   char *yptr = yytext; 
                   while (*yptr)
                      *string_buf_ptr++ = *yptr++;
                }
}
%%

Here is the parser:

%{
#include <stdio.h>
#include <stdlib.h>
/* interface to the lexer */
extern int yylineno; /* from lexer */
int yylex(void);
void yyerror(const char *s, ...);
extern FILE *yyin;
int yyparse (void);
%}
%union {
   char *strval;
}
%token <strval> STRING
%%
start 
    : STRING       { printf("The string is: '%s'", $1);}
;
%%

int main(int argc, char *argv[])
{
    yyin = fopen(argv[1], "r");
    
    yyparse();
    
    fclose(yyin);
    
    return 0;
}

void yyerror(const char *s, ...)
{
  fprintf(stderr, "%d: %s\n", yylineno, s);
}

[1] See page 24-25 in the Flex manual https://epaperpress.com/lexandyacc/download/flex.pdf

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

小梨窩很甜 2025-01-21 11:44:42

您的操作是:

*string_buf_ptr = '\0';
yylval.strval = strdup(string_buf_ptr)
return STRING;

很明显,string_buf_ptrstrdup 将返回一个新分配的空字符串副本,因为您刚刚设置了 指向的字符string_buf_ptr 为 0。

两条评论:

  • 这个 bug 本质上与 Flex(或 Bison)无关。我知道人们总是很容易假设您所使用的最不熟悉的技术是错误的根源,但做出这样的假设并不是一种非常有效的调试技术。
  • 调试器通常是比 StackOverflow 更快的查找错误的方法。使用 Gdb 有一点学习曲线,但最终肯定会得到回报(也许很快)。

此外,perror 旨在根据 errno 的值向用户显示错误消息。这在这种情况下不是很有用;您可能想调用yyerror。 (但是,您需要在词法分析器中声明它,除非您安排将其原型插入到 parse.tab.h 中。请参阅 %code require/< code>%code 在 bison 手册中提供了 来说明如何做到这一点。)

Your action is:

*string_buf_ptr = '\0';
yylval.strval = strdup(string_buf_ptr)
return STRING;

It seems pretty clear that strdup of string_buf_ptr will return a newly-allocated copy of an empty string, since you just set the character pointed to by string_buf_ptr to 0.

Two comments:

  • This bug has essentially nothing to do with Flex (or Bison). I know that it is always tempting to assume that the most unfamiliar technology you are using is the source of errors, but making assumptions like that is not a very effective debugging technique.
  • A debugger is often a faster way of finding bugs than StackOverflow. There's a bit of a learning curve to use Gdb, but it will definitely pay off in the end (perhaps even soon).

Also, perror is intended to present the user with an error message based on the value of errno. That's not very useful in this context; you probably want to call yyerror. (However, you'll need to declare it in the lexer, unless you arrange for its prototype to be inserted in parse.tab.h. See %code requires/%code provides in the bison manual for how to do that.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文