如何识别输入字符串中的多个词汇错误(使用flex)?
我正在使用 flex 为自定义语言生成词法分析器。我遇到的问题是,一旦我发现一个格式错误的令牌……我就无法知道这个令牌后面还有哪些其他令牌。例如
int v1,v2;
v1=10;v2=20;
v1=v2+1v;
print(v1);
~return;
,这里,令牌流将是 id,<,>,id,<;>......该语言指定每个令牌由空格分隔。因此,在 1 之后看到 av 应该会产生错误,并且扫描仪必须打印该错误。之后,还有更多的合法令牌和另一个非法令牌(~return)。如何处理其余的合法令牌并打印第二个错误。 我使用 flex 作为扫描仪生成器。当我发现词素的正则表达式都不匹配时,我调用一个错误例程来打印相应的消息。
调用此例程后如何恢复处理?
I am using flex to generate a lexical analyser for a custom language. The problem I am having is that as soon as I find a misformed token... I have no way to tell what other tokens follow this one. e.g.
int v1,v2;
v1=10;v2=20;
v1=v2+1v;
print(v1);
~return;
Here, the stream of tokens will be id,<,>,id,<;>...... the language specifies that each token be seperated by a whitespace. So seeing a v after 1 should produce an error and the scanner has to print that error. After that, there are more legal tokens and another illegal token (~return). How do I process the rest of the legal tokens and print the second error.
I am using flex as the scanner generator. When I find that none of the regular expressions for the lexeme matches, I call an error routine that prints the appropriate message.
How do I resume processing after calling this routine?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果空格在语法上很重要,则将其放入语法中。如果不是,因为它似乎不在这里,请忽略该规则。空格仅在需要分隔标记的情况下才具有语法意义,例如在
int a;
中。例如,COBOL 有一条规则,即除 PICTURE 字符串外,句点后必须跟有空格。执行这条规则比忽略它更难,所以我忽略了它。它通过了 FIPS 认证,但没有人注意到该编译器已投入生产使用了大约十年。
注意:处理非法字符的最佳方法是作为解析器错误。这样解析器就可以应用其错误恢复策略。在词汇层面上,你最好能打印并忽略它们。
在 lex/flex 中,这相当于有一条最终规则:
Put the white space into the grammar if it's syntactically significant. If it isn't, as it doesn't appear to be here, just ignore the rule. Whitespace is only syntactically significant where it is required to separate tokens, e.g. in
int a;
.For example, COBOL has a rule that a period must be followed by a white space except in a PICTURE string. Implementing the rule was harder than ignoring it, so I ignored it. It passed FIPS certification, and nobody ever noticed, over about ten years that the compiler was in production use.
NB The best way to handle illegal characters is as parser errors. That way the parser can apply its error recovery strategy. At the lexical level the best you can does print and ignore them.
In lex/flex, this just amounts to having a final rule that reads:
这是一个黑客,但这是我能想到的。
Flex 手册
表示您可以访问 YY_BUFFER_STATE。
您可能能够使用从 YY_BUFFER_STATE 中的成员 yy_input_file 获取的文件中的当前位置,结合 yyrestart(File *f) 在错误点之后恢复解析。
再次强调,似乎必须有更好的解决方案。
This is a hack, but it is all I could come up with.
The Flex manual
says you have access to YY_BUFFER_STATE.
You might be able to use the current position in the file, gotten from the member yy_input_file in YY_BUFFER_STATE, in conjunction with yyrestart(File *f) to resume parsing just after the point of the error.
Again, it seems like there has to be a better solution.