当前位置：文江博客话题详情

Flex 词法分析器的字符串输入

发布于 2024-07-17 12:45:08 字数 188 浏览 5 评论 0原文

我想使用 flex/bison 解析器创建一个读取-评估-打印循环。问题是，flex 生成的词法分析器需要 FILE* 类型的输入，而我希望它是 char*。有办法做到这一点吗？

一个建议是创建一个管道，向其提供字符串并打开文件描述符并发送到词法分析器。这相当简单，但感觉很复杂，而且不太独立于平台。有没有更好的办法？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

原野 2024-07-24 12:45:08

以下例程可用于设置输入缓冲区以扫描内存中字符串而不是文件（如 yy_create_buffer 所做的那样）：

YY_BUFFER_STATE yy_scan_string(const char *str)：扫描以 NUL 结尾的字符串`
YY_BUFFER_STATE yy_scan_bytes(const char *bytes, int len)：从位置字节开始扫描 len 个字节（可能包括 NUL）

请注意，这两个函数都会创建、返回相应的 YY_BUFFER_STATE 句柄（您必须使用 yy_delete_buffer( 删除该句柄））完成后），因此 yylex() 扫描字符串或字节的副本。这种行为可能是理想的，因为 yylex() 修改了它正在扫描的缓冲区的内容）。

如果您想避免复制（和 yy_delete_buffer），请使用：

YY_BUFFER_STATE yy_scan_buffer(char *base, yy_size_t size)

示例 main：

int main() {
    yy_scan_buffer("a test string");
    yylex();
}

The following routines are available for setting up input buffers for scanning in-memory strings instead of files (as yy_create_buffer does):

YY_BUFFER_STATE yy_scan_string(const char *str): scans a NUL-terminated string`
YY_BUFFER_STATE yy_scan_bytes(const char *bytes, int len): scans len bytes (including possibly NULs) starting at location bytes

Note that both of these functions create, return a corresponding YY_BUFFER_STATE handle (which you must delete with yy_delete_buffer() when done with it) so yylex() scan a copy of the string or bytes. This behavior may be desirable since yylex() modifies the contents of the buffer it is scanning).

If you want avoid the copy (and yy_delete_buffer) using:

YY_BUFFER_STATE yy_scan_buffer(char *base, yy_size_t size)

sample main:

int main() {
    yy_scan_buffer("a test string");
    yylex();
}

回复收藏 0 原文

寂寞陪衬 2024-07-24 12:45:08

有关信息，请参阅 Flex 手册的本节关于如何扫描内存缓冲区，例如字符串。

回复收藏 0 原文

桃扇骨 2024-07-24 12:45:08

flex 可以使用以下三个函数之一解析 char *：yy_scan_string()，
yy_scan_buffer() 和 yy_scan_bytes()（请参阅文档）。这是第一个示例：

typedef struct yy_buffer_state * YY_BUFFER_STATE;
extern int yyparse();
extern YY_BUFFER_STATE yy_scan_string(char * str);
extern void yy_delete_buffer(YY_BUFFER_STATE buffer);

int main(){
    char string[] = "String to be parsed.";
    YY_BUFFER_STATE buffer = yy_scan_string(string);
    yyparse();
    yy_delete_buffer(buffer);
    return 0;
}

yy_scan_buffer() 的等效语句（需要双空终止字符串）：

char string[] = "String to be parsed.\0";
YY_BUFFER_STATE buffer = yy_scan_buffer(string, sizeof(string));

我的回答重申了 @dfa 和 @jlholland 提供的一些信息，但两者都没有他们的答案代码似乎对我有用。

flex can parse char * using any one of three functions: yy_scan_string(),
yy_scan_buffer(), and yy_scan_bytes() (see the documentation). Here's an example of the first:

typedef struct yy_buffer_state * YY_BUFFER_STATE;
extern int yyparse();
extern YY_BUFFER_STATE yy_scan_string(char * str);
extern void yy_delete_buffer(YY_BUFFER_STATE buffer);

int main(){
    char string[] = "String to be parsed.";
    YY_BUFFER_STATE buffer = yy_scan_string(string);
    yyparse();
    yy_delete_buffer(buffer);
    return 0;
}

The equivalent statements for yy_scan_buffer() (which requires a doubly null-terminated string):

char string[] = "String to be parsed.\0";
YY_BUFFER_STATE buffer = yy_scan_buffer(string, sizeof(string));

My answer reiterates some of the information provided by @dfa and @jlholland, but neither of their answers' code seemed to be working for me.

回复收藏 0 原文

烂人 2024-07-24 12:45:08

这就是我需要做的：

extern yy_buffer_state;
typedef yy_buffer_state *YY_BUFFER_STATE;
extern int yyparse();
extern YY_BUFFER_STATE yy_scan_buffer(char *, size_t);

int main(int argc, char** argv) {

  char tstr[] = "line i want to parse\n\0\0";
  // note yy_scan_buffer is is looking for a double null string
  yy_scan_buffer(tstr, sizeof(tstr));
  yy_parse();
  return 0;
}

你不能 extern typedef，当你考虑它时这是有道理的。

Here is what I needed to do :

extern yy_buffer_state;
typedef yy_buffer_state *YY_BUFFER_STATE;
extern int yyparse();
extern YY_BUFFER_STATE yy_scan_buffer(char *, size_t);

int main(int argc, char** argv) {

  char tstr[] = "line i want to parse\n\0\0";
  // note yy_scan_buffer is is looking for a double null string
  yy_scan_buffer(tstr, sizeof(tstr));
  yy_parse();
  return 0;
}

you cannot extern the typedef, which make sense when you think about it.

回复收藏 0 原文

花开浅夏 2024-07-24 12:45:08

接受的答案是不正确的。会导致内存泄漏。

在内部，yy_scan_string 调用 yy_scan_bytes，后者又调用 yy_scan_buffer。

yy_scan_bytes 为输入缓冲区的副本分配内存。

yy_scan_buffer 直接作用于提供的缓冲区。

对于所有三种形式，您必须调用 yy_delete_buffer 来释放 Flex 缓冲区状态信息 (YY_BUFFER_STATE)。

但是，使用 yy_scan_buffer，您可以避免内部缓冲区的内部分配/复制/释放。

yy_scan_buffer 的原型不采用 const char* 并且您不能期望内容保持不变。

如果您分配了内存来保存字符串，则您有责任在调用 yy_delete_buffer 之后释放它。

另外，当您仅解析此字符串时，不要忘记让 yywrap 返回 1（非零）。

下面是一个完整的例子。

%%

<<EOF>> return 0;

.   return 1;

%%

int yywrap()
{
    return (1);
}

int main(int argc, const char* const argv[])
{
    FILE* fileHandle = fopen(argv[1], "rb");
    if (fileHandle == NULL) {
        perror("fopen");
        return (EXIT_FAILURE);
    }

    fseek(fileHandle, 0, SEEK_END);
    long fileSize = ftell(fileHandle);
    fseek(fileHandle, 0, SEEK_SET);

    // When using yy_scan_bytes, do not add 2 here ...
    char *string = malloc(fileSize + 2);

    fread(string, fileSize, sizeof(char), fileHandle);

    fclose(fileHandle);

    // Add the two NUL terminators, required by flex.
    // Omit this for yy_scan_bytes(), which allocates, copies and
    // apends these for us.   
    string[fileSize] = '\0';
    string[fileSize + 1] = '\0';

    // Our input file may contain NULs ('\0') so we MUST use
    // yy_scan_buffer() or yy_scan_bytes(). For a normal C (NUL-
    // terminated) string, we are better off using yy_scan_string() and
    // letting flex manage making a copy of it so the original may be a
    // const char (i.e., literal) string.
    YY_BUFFER_STATE buffer = yy_scan_buffer(string, fileSize + 2);

    // This is a flex source file, for yacc/bison call yyparse()
    // here instead ...
    int token;
    do {
        token = yylex(); // MAY modify the contents of the 'string'.
    } while (token != 0);

    // After flex is done, tell it to release the memory it allocated.    
    yy_delete_buffer(buffer);

    // And now we can release our (now dirty) buffer.
    free(string);

    return (EXIT_SUCCESS);
}

The accepted answer is incorrect. It will cause memory leaks.

Internally, yy_scan_string calls yy_scan_bytes which, in turn, calls yy_scan_buffer.

yy_scan_bytes allocates memory for a COPY of the input buffer.

yy_scan_buffer works directly upon the supplied buffer.

With all three forms, you MUST call yy_delete_buffer to free the flex buffer-state information (YY_BUFFER_STATE).

However, with yy_scan_buffer, you avoid the internal allocation/copy/free of the internal buffer.

The prototype for yy_scan_buffer does NOT take a const char* and you MUST NOT expect the contents to remain unchanged.

If you allocated memory to hold your string, you are responsible for freeing it AFTER you call yy_delete_buffer.

Also, don't forget to have yywrap return 1 (non-zero) when you're parsing JUST this string.

Below is a COMPLETE example.

%%

<<EOF>> return 0;

.   return 1;

%%

int yywrap()
{
    return (1);
}

int main(int argc, const char* const argv[])
{
    FILE* fileHandle = fopen(argv[1], "rb");
    if (fileHandle == NULL) {
        perror("fopen");
        return (EXIT_FAILURE);
    }

    fseek(fileHandle, 0, SEEK_END);
    long fileSize = ftell(fileHandle);
    fseek(fileHandle, 0, SEEK_SET);

    // When using yy_scan_bytes, do not add 2 here ...
    char *string = malloc(fileSize + 2);

    fread(string, fileSize, sizeof(char), fileHandle);

    fclose(fileHandle);

    // Add the two NUL terminators, required by flex.
    // Omit this for yy_scan_bytes(), which allocates, copies and
    // apends these for us.   
    string[fileSize] = '\0';
    string[fileSize + 1] = '\0';

    // Our input file may contain NULs ('\0') so we MUST use
    // yy_scan_buffer() or yy_scan_bytes(). For a normal C (NUL-
    // terminated) string, we are better off using yy_scan_string() and
    // letting flex manage making a copy of it so the original may be a
    // const char (i.e., literal) string.
    YY_BUFFER_STATE buffer = yy_scan_buffer(string, fileSize + 2);

    // This is a flex source file, for yacc/bison call yyparse()
    // here instead ...
    int token;
    do {
        token = yylex(); // MAY modify the contents of the 'string'.
    } while (token != 0);

    // After flex is done, tell it to release the memory it allocated.    
    yy_delete_buffer(buffer);

    // And now we can release our (now dirty) buffer.
    free(string);

    return (EXIT_SUCCESS);
}

回复收藏 0 原文

爱，才寂寞 2024-07-24 12:45:08

或者，您可以在 lex 文件中重新定义函数 YY_INPUT，然后将字符串设置为 LEX 的输入。如下：

#undef YY_INPUT
#define YY_INPUT(buf) (my_yyinput(buf))

char my_buf[20];

void set_lexbuf(char *org_str)
{  strcpy(my_buf, org_str);  }

void my_yyinput (char *buf)
{  strcpy(buf, my_buf);      }

在你的main.c中，在扫描之前，你需要先设置lex的缓冲区：

set_lexbuf(your_string);
scanning...

Other-way, you can redefine function YY_INPUT in lex file, and then set your string to LEX's input. As below:

#undef YY_INPUT
#define YY_INPUT(buf) (my_yyinput(buf))

char my_buf[20];

void set_lexbuf(char *org_str)
{  strcpy(my_buf, org_str);  }

void my_yyinput (char *buf)
{  strcpy(buf, my_buf);      }

In your main.c, before scanning, you need to set lex's buffer first:

set_lexbuf(your_string);
scanning...

回复收藏 0 原文

杀手六號 2024-07-24 12:45:08

这是一个在 cpp 代码中使用 bison / flex 作为解析器的小示例，用于解析字符串并根据它更改字符串值
（代码的一小部分被删除，因此可能存在不相关的部分。）
解析器.y：

%{
#include "parser.h"
#include "lex.h"
#include <math.h> 
#include <fstream>
#include <iostream> 
#include <string>
#include <vector>
using namespace std;
 int yyerror(yyscan_t scanner, string result, const char *s){  
    (void)scanner;
    std::cout << "yyerror : " << *s << " - " << s << std::endl;
    return 1;
  }
    %}

%code requires{
#define YY_TYPEDEF_YY_SCANNER_T 
typedef void * yyscan_t;
#define YYERROR_VERBOSE 0
#define YYMAXDEPTH 65536*1024 
#include <math.h> 
#include <fstream>
#include <iostream> 
#include <string>
#include <vector>
}
%output "parser.cpp"
%defines "parser.h"
%define api.pure full
%lex-param{ yyscan_t scanner }
%parse-param{ yyscan_t scanner } {std::string & result}

%union {
  std::string *  sval;
}

%token TOKEN_ID TOKEN_ERROR TOKEN_OB TOKEN_CB TOKEN_AND TOKEN_XOR TOKEN_OR TOKEN_NOT
%type <sval>  TOKEN_ID expression unary_expression binary_expression
%left BINARY_PRIO
%left UNARY_PRIO
%%

top:
expression {result = *$1;}
;
expression:
TOKEN_ID  {$=$1; }
| TOKEN_OB expression TOKEN_CB  {$=$2;}
| binary_expression  {$=$1;}
| unary_expression  {$=$1;}
;

unary_expression:
 TOKEN_NOT expression %prec UNARY_PRIO {result =  " (NOT " + *$2 + " ) " ; $ = &result;}
;
binary_expression:
expression expression  %prec BINARY_PRIO {result = " ( " + *$1+ " AND " + *$2 + " ) "; $ = &result;}
| expression TOKEN_AND expression %prec BINARY_PRIO {result = " ( " + *$1+ " AND " + *$3 + " ) "; $ = &result;} 
| expression TOKEN_OR expression %prec BINARY_PRIO {result = " ( " + *$1 + " OR " + *$3 + " ) "; $ = &result;} 
| expression TOKEN_XOR expression %prec BINARY_PRIO {result = " ( " + *$1 + " XOR " + *$3 + " ) "; $ = &result;} 
;

%%

lexer.l : 

%{
#include <string>
#include "parser.h"

%}
%option outfile="lex.cpp" header-file="lex.h"
%option noyywrap never-interactive
%option reentrant
%option bison-bridge

%top{
/* This code goes at the "top" of the generated file. */
#include <stdint.h>
}

id        ([a-zA-Z][a-zA-Z0-9]*)+
white     [ \t\r]
newline   [\n]

%%
{id}                    {    
    yylval->sval = new std::string(yytext);
    return TOKEN_ID;
}
"(" {return TOKEN_OB;}
")" {return TOKEN_CB;}
"*" {return TOKEN_AND;}
"^" {return TOKEN_XOR;}
"+" {return TOKEN_OR;}
"!" {return TOKEN_NOT;}

{white};  // ignore white spaces
{newline};
. {
return TOKEN_ERROR;
}

%%

usage : 
void parse(std::string& function) {
  string result = "";
  yyscan_t scanner;
  yylex_init_extra(NULL, &scanner);
  YY_BUFFER_STATE state = yy_scan_string(function.c_str() , scanner);
  yyparse(scanner,result);
  yy_delete_buffer(state, scanner);
  yylex_destroy(scanner);
  function = " " + result + " ";  
}

makefile:
parser.h parser.cpp: parser.y
    @ /usr/local/bison/2.7.91/bin/bison -y -d parser.y


lex.h lex.cpp: lexer.l
    @ /usr/local/flex/2.5.39/bin/flex lexer.l

clean:
    - \rm -f *.o parser.h parser.cpp lex.h lex.cpp

here is a small example for using bison / flex as a parser inside your cpp code for parsing string and changing a string value according to it
(few parts of the code were removed so there might be irrelevant parts there.)
parser.y :

%{
#include "parser.h"
#include "lex.h"
#include <math.h> 
#include <fstream>
#include <iostream> 
#include <string>
#include <vector>
using namespace std;
 int yyerror(yyscan_t scanner, string result, const char *s){  
    (void)scanner;
    std::cout << "yyerror : " << *s << " - " << s << std::endl;
    return 1;
  }
    %}

%code requires{
#define YY_TYPEDEF_YY_SCANNER_T 
typedef void * yyscan_t;
#define YYERROR_VERBOSE 0
#define YYMAXDEPTH 65536*1024 
#include <math.h> 
#include <fstream>
#include <iostream> 
#include <string>
#include <vector>
}
%output "parser.cpp"
%defines "parser.h"
%define api.pure full
%lex-param{ yyscan_t scanner }
%parse-param{ yyscan_t scanner } {std::string & result}

%union {
  std::string *  sval;
}

%token TOKEN_ID TOKEN_ERROR TOKEN_OB TOKEN_CB TOKEN_AND TOKEN_XOR TOKEN_OR TOKEN_NOT
%type <sval>  TOKEN_ID expression unary_expression binary_expression
%left BINARY_PRIO
%left UNARY_PRIO
%%

top:
expression {result = *$1;}
;
expression:
TOKEN_ID  {$=$1; }
| TOKEN_OB expression TOKEN_CB  {$=$2;}
| binary_expression  {$=$1;}
| unary_expression  {$=$1;}
;

unary_expression:
 TOKEN_NOT expression %prec UNARY_PRIO {result =  " (NOT " + *$2 + " ) " ; $ = &result;}
;
binary_expression:
expression expression  %prec BINARY_PRIO {result = " ( " + *$1+ " AND " + *$2 + " ) "; $ = &result;}
| expression TOKEN_AND expression %prec BINARY_PRIO {result = " ( " + *$1+ " AND " + *$3 + " ) "; $ = &result;} 
| expression TOKEN_OR expression %prec BINARY_PRIO {result = " ( " + *$1 + " OR " + *$3 + " ) "; $ = &result;} 
| expression TOKEN_XOR expression %prec BINARY_PRIO {result = " ( " + *$1 + " XOR " + *$3 + " ) "; $ = &result;} 
;

%%

lexer.l : 

%{
#include <string>
#include "parser.h"

%}
%option outfile="lex.cpp" header-file="lex.h"
%option noyywrap never-interactive
%option reentrant
%option bison-bridge

%top{
/* This code goes at the "top" of the generated file. */
#include <stdint.h>
}

id        ([a-zA-Z][a-zA-Z0-9]*)+
white     [ \t\r]
newline   [\n]

%%
{id}                    {    
    yylval->sval = new std::string(yytext);
    return TOKEN_ID;
}
"(" {return TOKEN_OB;}
")" {return TOKEN_CB;}
"*" {return TOKEN_AND;}
"^" {return TOKEN_XOR;}
"+" {return TOKEN_OR;}
"!" {return TOKEN_NOT;}

{white};  // ignore white spaces
{newline};
. {
return TOKEN_ERROR;
}

%%

usage : 
void parse(std::string& function) {
  string result = "";
  yyscan_t scanner;
  yylex_init_extra(NULL, &scanner);
  YY_BUFFER_STATE state = yy_scan_string(function.c_str() , scanner);
  yyparse(scanner,result);
  yy_delete_buffer(state, scanner);
  yylex_destroy(scanner);
  function = " " + result + " ";  
}

makefile:
parser.h parser.cpp: parser.y
    @ /usr/local/bison/2.7.91/bin/bison -y -d parser.y


lex.h lex.cpp: lexer.l
    @ /usr/local/flex/2.5.39/bin/flex lexer.l

clean:
    - \rm -f *.o parser.h parser.cpp lex.h lex.cpp

回复收藏 0 原文

流年里的时光 2024-07-24 12:45:08

libmatheval 中有这段有趣的代码：

/* Redefine macro to redirect scanner input from string instead of
 * standard input.  */
#define YY_INPUT( buffer, result, max_size ) \
{ result = input_from_string (buffer, max_size); }

There's this funny code in libmatheval:

/* Redefine macro to redirect scanner input from string instead of
 * standard input.  */
#define YY_INPUT( buffer, result, max_size ) \
{ result = input_from_string (buffer, max_size); }

回复收藏 0 原文

~没有更多了~