如何识别输入字符串中的多个词汇错误（使用flex）？

发布于 2024-10-23 01:04:38 字数 387 浏览 4 评论 0原文

我正在使用 flex 为自定义语言生成词法分析器。我遇到的问题是，一旦我发现一个格式错误的令牌……我就无法知道这个令牌后面还有哪些其他令牌。例如

int v1,v2;
v1=10;v2=20;
v1=v2+1v;
print(v1);
~return;

，这里，令牌流将是 id,<,>,id,<;>......该语言指定每个令牌由空格分隔。因此，在 1 之后看到 av 应该会产生错误，并且扫描仪必须打印该错误。之后，还有更多的合法令牌和另一个非法令牌（~return）。如何处理其余的合法令牌并打印第二个错误。我使用 flex 作为扫描仪生成器。当我发现词素的正则表达式都不匹配时，我调用一个错误例程来打印相应的消息。

调用此例程后如何恢复处理？

原文

I am using flex to generate a lexical analyser for a custom language. The problem I am having is that as soon as I find a misformed token... I have no way to tell what other tokens follow this one. e.g.

int v1,v2;
v1=10;v2=20;
v1=v2+1v;
print(v1);
~return;

Here, the stream of tokens will be id,<,>,id,<;>...... the language specifies that each token be seperated by a whitespace. So seeing a v after 1 should produce an error and the scanner has to print that error. After that, there are more legal tokens and another illegal token (~return). How do I process the rest of the legal tokens and print the second error.
I am using flex as the scanner generator. When I find that none of the regular expressions for the lexeme matches, I call an error routine that prints the appropriate message.

How do I resume processing after calling this routine?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

身边 2024-10-30 01:04:38

如果空格在语法上很重要，则将其放入语法中。如果不是，因为它似乎不在这里，请忽略该规则。空格仅在需要分隔标记的情况下才具有语法意义，例如在 int a; 中。

例如，COBOL 有一条规则，即除 PICTURE 字符串外，句点后必须跟有空格。执行这条规则比忽略它更难，所以我忽略了它。它通过了 FIPS 认证，但没有人注意到该编译器已投入生产使用了大约十年。

注意：处理非法字符的最佳方法是作为解析器错误。这样解析器就可以应用其错误恢复策略。在词汇层面上，你最好能打印并忽略它们。

在 lex/flex 中，这相当于有一条最终规则：

. return yytext[0];

Put the white space into the grammar if it's syntactically significant. If it isn't, as it doesn't appear to be here, just ignore the rule. Whitespace is only syntactically significant where it is required to separate tokens, e.g. in int a;.

For example, COBOL has a rule that a period must be followed by a white space except in a PICTURE string. Implementing the rule was harder than ignoring it, so I ignored it. It passed FIPS certification, and nobody ever noticed, over about ten years that the compiler was in production use.

NB The best way to handle illegal characters is as parser errors. That way the parser can apply its error recovery strategy. At the lexical level the best you can does print and ignore them.

In lex/flex, this just amounts to having a final rule that reads: