lex 规范有问题

发布于 2024-10-18 13:59:33 字数 1229 浏览 7 评论 0原文

我正在尝试为 lex 中的语言定义一个简单的分词器。

基本上，我想为括号、逗号、比较操作、in/con/ncon 操作和逻辑操作定义标记。我希望任何其他标记与“关键字”正则表达式匹配，因为这在我的语言中代表 STRINGARG。

每次我尝试向它提供像“A_FIELD”这样的字符串时，它都会给我一个词法分析器错误。我希望它匹配“关键字”并返回 STRINGARG 令牌。

这是我的 .l 文件：

%{
#include "y.tab.h"
%}

lparen "("
rparen ")"
comma ","
comparison ("=="|"!="|">"|"<"|">="|"<=")
intok ("in"|"IN")
conncontok ("con"|"CON"|"ncon"|"NCON")
logical ("and"|"or"|"AND"|"OR"|"&"|"|")
keywords ( "(" | ")" | "," | "==" | "!=" | ">" | "<" | ">=" | "<=" | "in" | "IN" | "con" | "CON" | "ncon" | "NCON" | "and" | "AND" | "&" | "or"\
 | "OR" | "|" )

%%

" "                    /* ignore whitespace */
{lparen}               { return LPAREN; }
{rparen}               { return RPAREN; }
{comma}                { return COMMA; }
{comparison}           { yylval.str = yytext; return COMPARISON; }
{intok}                { return IN; }
{conncontok}           { yylval.str = yytext; return CONNCON; }
{logical}              { return LOGICAL; }
^keywords              { yylval.str = yytext; return STRINGARG; }
.                      { printf("LEXER ERROR."); exit(1); }

%%

#ifndef yywrap
int yywrap() { return 1; }
#endif

原文

I'm trying to define a simple tokenizer for a language in lex.

Basically , I want to define tokens for parenthesis, comma, comparison ops, in/con/ncon ops, and logical ops. And I want any other token to match the 'keywords' regexp, as that would represent a STRINGARG in my language.

Everytime I try to feed it a string like 'A_FIELD', it gives me a LEXER ERROR. I want it to match 'keywords' and return a STRINGARG token.

Here is my .l file :

%{
#include "y.tab.h"
%}

lparen "("
rparen ")"
comma ","
comparison ("=="|"!="|">"|"<"|">="|"<=")
intok ("in"|"IN")
conncontok ("con"|"CON"|"ncon"|"NCON")
logical ("and"|"or"|"AND"|"OR"|"&"|"|")
keywords ( "(" | ")" | "," | "==" | "!=" | ">" | "<" | ">=" | "<=" | "in" | "IN" | "con" | "CON" | "ncon" | "NCON" | "and" | "AND" | "&" | "or"\
 | "OR" | "|" )

%%

" "                    /* ignore whitespace */
{lparen}               { return LPAREN; }
{rparen}               { return RPAREN; }
{comma}                { return COMMA; }
{comparison}           { yylval.str = yytext; return COMPARISON; }
{intok}                { return IN; }
{conncontok}           { yylval.str = yytext; return CONNCON; }
{logical}              { return LOGICAL; }
^keywords              { yylval.str = yytext; return STRINGARG; }
.                      { printf("LEXER ERROR."); exit(1); }

%%

#ifndef yywrap
int yywrap() { return 1; }
#endif

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谁许谁一生繁华 2024-10-25 13:59:33

我找到了这个问题的答案。

基本上我希望 stringarg 是除已识别标记之一以外的任何内容。因此，当我如下设置 lex 定义时，一切都很顺利。我应该在最后一条规则中使用字符类，而不是标记：

%%

" "                    /* ignore whitespace */
{lparen}               { return LPAREN; }
{rparen}               { return RPAREN; }
{comma}                { return COMMA; }
{comparison}           { yylval.str = yytext; return COMPARISON; }
{intok}                { return IN; }
{conncontok}           { yylval.str = yytext; return CONNCON; }
{logical}              { return LOGICAL; }
**[^ \t\n]+              { yylval.str = yytext; return STRINGARG; }**
.                      { printf( "Lexer error." ); exit(1); }
%%

I found the answer to this problem.

Basically I wanted a stringarg to be anything other than one of the recognized tokens. So when I set up my lex definition as follows, everything worked out fine. I should have been using character classes, not tokens in the last rule :

%%

" "                    /* ignore whitespace */
{lparen}               { return LPAREN; }
{rparen}               { return RPAREN; }
{comma}                { return COMMA; }
{comparison}           { yylval.str = yytext; return COMPARISON; }
{intok}                { return IN; }
{conncontok}           { yylval.str = yytext; return CONNCON; }
{logical}              { return LOGICAL; }
**[^ \t\n]+              { yylval.str = yytext; return STRINGARG; }**
.                      { printf( "Lexer error." ); exit(1); }
%%

回复收藏 0 原文

~没有更多了~