正则表达式捕获行首没有空格(flex)

发布于 2024-10-21 00:49:05 字数 859 浏览 6 评论 0原文

我正在为编译器构造类的 Python 语法(用 Flex 编写)开发一个词法分析器,当行首没有空格时,我无法获得正常工作的正则表达式来捕获(考虑到用于缩进块的末尾)。

检查无缩进的规则出现在检查注释、空行和缩进之后。它也在规则检查其他任何内容之前。现在看起来是这样的:

<INITIAL>^[^ \t] {
  printf("DEBUG: Expression ^[^ \\t] matches string: %s\n", yytext);

  /* Dedent to 0 if not mid-expression */
  if(!lineJoin && bracketDepth() == 0)
    changeIndent(0);

  /* Treat line as normal */
  REJECT;
}

据我了解,上面的规则应该为 lexed 文件中具有实际 python 代码但不以缩进开头的任何行输出调试行。然而,就目前情况而言,我的许多文本案例中很少有行显示它。

例如,调试输出在这个测试用例中没有出现(它也完全错过了第 4 行的缩进):

myList = [1,2,3,4]
for index in range(len(myList)):
    myList[index] += 1
print( myList )

但是在这一行中的每一行都出现了:

a = 1 + 1
b = 2 % 3
c = 1 ^ 1
d = 1 - 1
f = 1 * 1
g = 1 / 1

鉴于大多数其他规则正常工作,我相信正则表达式是上述规则中的问题,但我不明白为什么这个规则大多数时候都会失败。有人有任何见解吗?

I'm working on a lexer for the Python grammar (written in Flex) for a compiler construction class and I'm having trouble getting a properly working regular expression to catch when there is no white space at the beginning of a line (to account for the end of an indented block).

The rule checking for no indentation appears after those checking for comments, blank lines, and indentation. It is also before rules checking for anything else. Here's what it looks like right now:

<INITIAL>^[^ \t] {
  printf("DEBUG: Expression ^[^ \\t] matches string: %s\n", yytext);

  /* Dedent to 0 if not mid-expression */
  if(!lineJoin && bracketDepth() == 0)
    changeIndent(0);

  /* Treat line as normal */
  REJECT;
}

As I understand it, the rule above should output that debug line for any line in the lexed file that has actual python code but doesn't start with indentation. However, as it stands now, very few lines in my many text cases display it.

For example, the debug output appears nowhere for this test case (it also misses the dedent entirely on line 4):

myList = [1,2,3,4]
for index in range(len(myList)):
    myList[index] += 1
print( myList )

but appears for every line in this one:

a = 1 + 1
b = 2 % 3
c = 1 ^ 1
d = 1 - 1
f = 1 * 1
g = 1 / 1

Given that most of the other rules work properly, I'm led to believe that the regex is the problem in the above rule but I don't see why this one is failing most of the time. Does anyone have any insight?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

无言温柔 2024-10-28 00:49:05

我不知道 flex,但我观察到每个有效的样本都是一个字符,而每个无效的样本则不是。也许 flex 匹配的是整个标记而不是单个字符?您可以尝试在字符类后面添加 +

I don't know flex, but I observe that each sample that worked is a single character, while each one that didn't work is not. Perhaps flex is matching against entire tokens instead of single characters? You might try adding a + after the character class.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文