如何在 Lex 和 Yacc 中获取整个输入字符串?
好的,这就是交易。
在我的语言中,我有一些命令,比如
XYZ 3 5
GGB 8 9
HDH 8783 33
在我的 Lex 文件中
XYZ { return XYZ; }
GGB { return GGB; }
HDH { return HDH; }
[0-9]+ { yylval.ival = atoi(yytext); return NUMBER; }
\n { return EOL; }
在我的 yacc 文件
start : commands
;
commands : command
| command EOL commands
;
command : xyz
| ggb
| hdh
;
xyz : XYZ NUMBER NUMBER { /* Do something with the numbers */ }
;
etc. etc. etc. etc.
中我的问题是,如何将整个文本
XYZ 3 5
GGB 8 9
HDH 8783 33
放入命令中,同时仍然返回数字?
另外,当我的 Lex 返回 STRING [0-9a-zA-Z]+ 时,并且我想对其长度进行验证,我应该这样做
rule: STRING STRING { if (strlen($1) < 5 ) /* Do some shit else error */ }
还是实际上在我的 Lex 中有一个令牌根据长度返回不同的令牌?
OK, so here is the deal.
In my language I have some commands, say
XYZ 3 5
GGB 8 9
HDH 8783 33
And in my Lex file
XYZ { return XYZ; }
GGB { return GGB; }
HDH { return HDH; }
[0-9]+ { yylval.ival = atoi(yytext); return NUMBER; }
\n { return EOL; }
In my yacc file
start : commands
;
commands : command
| command EOL commands
;
command : xyz
| ggb
| hdh
;
xyz : XYZ NUMBER NUMBER { /* Do something with the numbers */ }
;
etc. etc. etc. etc.
My question is, how can I get the entire text
XYZ 3 5
GGB 8 9
HDH 8783 33
Into commands while still returning the NUMBERs?
Also when my Lex returns a STRING [0-9a-zA-Z]+, and I want to do verification on it's length, should I do it like
rule: STRING STRING { if (strlen($1) < 5 ) /* Do some shit else error */ }
or actually have a token in my Lex that returns different tokens depending on length?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果我正确理解了您的第一个问题,您可以使用语义操作,这样
您就可以根据需要构建命令的值。
对于你的第二个问题,词法分析和语法分析之间以及语法分析和语义分析之间的界限并不严格且固定。 移动它们是在描述的容易性、错误消息的清晰度和出现错误时的鲁棒性等因素之间进行权衡。 考虑到字符串长度的验证,出现错误的可能性相当大,如果通过不同长度返回不同终端来处理,错误信息可能会不清楚。 因此,如果可能的话——这取决于语法——我会在语义分析阶段处理它,在这个阶段可以轻松地定制消息。
If I've understood your first question correctly, you can have semantic actions like
which will allow you to build the value of command as you want.
For your second question, the borders between lexical analysis and grammatical analysis and between grammatical analysis and semantic analysis aren't hard and well fixed. Moving them is a trade-off between factors like easiness of description, clarity of error messages and robustness in presence of errors. Considering the verification of string length, the likelihood of an error occurring is quite high and the error message if it is handled by returning different terminals for different length will probably be not clear. So if it is possible -- that depend on the grammar -- I'd handle it in the semantic analysis phase, where the message can easily be tailored.
如果您安排词法分析器 (
yylex()
) 将整个字符串存储在某个变量中,那么您的代码就可以访问它。 与解析器本身的通信将通过正常的机制进行,但是没有什么表明您不能同时潜伏另一个变量(可能是文件静态变量 - 但要注意多线程),它在解析之前存储整个输入行。If you arrange for your lexical analyzer (
yylex()
) to store the whole string in some variable, then your code can access it. The communication with the parser proper will be through the normal mechanisms, but there's nothing that says you can't also have another variable lurking around (probably a file static variable - but beware multithreading) that stores the whole input line before it is dissected.当您使用
yylval.ival
时,您的 YACC 源代码中已经有了union
和ival
字段,如下所示:现在您指定令牌类型,如下所示:
所以现在您可以在规则中将 NUMBER 标记作为
$1
访问ival
字段,就像对于您的第二个问题,我将像这样定义联合:
在您的 LEX 源代码中指定令牌类型
所以现在你可以做类似的事情
As you use
yylval.ival
you already haveunion
withival
field in your YACC source, like this:Now you specify token type, like this:
So now you can access
ival
field simply for NUMBER token as$1
in your rules, likeFor your second question I'd define union like this:
and in you LEX source specify token types
So now you can do things like