识别缩进块的词法分析器

发布于 2024-11-27 12:05:43 字数 161 浏览 7 评论 0原文

我想为一种用空格表示程序块的语言编写一个编译器，就像Python一样。我更喜欢在 Python 中执行此操作，但 C++ 也是一种选择。是否有一个开源词法分析器可以帮助我轻松完成此操作，例如通过像 Python 词法分析器一样正确生成 INDENT 和 DEDENT 标识符？相应的解析器生成器将是一个优点。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

甜味超标? 2024-12-04 12:05:43

LEPL 是纯Python，支持越位解析。

回复收藏 0 原文

终陌 2024-12-04 12:05:43

如果您使用像 lex 这样的东西，您可以这样做：

^[ \t]+              { int new_indent = count_indent(yytext);
                       if (new_indent > current_indent) {
                          current_indent = new_indent;
                          return INDENT;
                       } else if (new_indent < current_indent) {
                          current_indent = new_indent;
                          return DEDENT;
                       }
                       /* Else do nothing, and this way
                          you can essentially treat INDENT and DEDENT
                          as opening and closing braces. */
                     }

您可能需要一些额外的逻辑，例如忽略空行，并在需要时自动在文件末尾添加 DEDENT。

据推测， count_indent 会考虑根据制表位值将制表符转换为空格。

我不知道 Python 的词法分析器/解析器生成器，但我发布的内容应该与 lex/flex 一起使用，并且您可以将其连接到 yacc/bison 来创建解析器。您可以使用 C 或 C++ 来处理它们。

If you're using something like lex, you can do it this way:

^[ \t]+              { int new_indent = count_indent(yytext);
                       if (new_indent > current_indent) {
                          current_indent = new_indent;
                          return INDENT;
                       } else if (new_indent < current_indent) {
                          current_indent = new_indent;
                          return DEDENT;
                       }
                       /* Else do nothing, and this way
                          you can essentially treat INDENT and DEDENT
                          as opening and closing braces. */
                     }

You may need a little additional logic, for example to ignore blank lines, and to automatically add a DEDENT at the end of the file if needed.

Presumably count_indent would take into account converting tabs to spaces according to a tab-stop value.

I don't know about lexer/parser generators for Python, but what I posted should work with lex/flex, and you can hook it up to yacc/bison to create a parser. You could use C or C++ with those.

回复收藏 0 原文

~没有更多了~