正则表达式捕获行首没有空格（flex）

发布于 2024-10-21 00:49:05 字数 859 浏览 6 评论 0原文

我正在为编译器构造类的 Python 语法（用 Flex 编写）开发一个词法分析器，当行首没有空格时，我无法获得正常工作的正则表达式来捕获（考虑到用于缩进块的末尾）。

检查无缩进的规则出现在检查注释、空行和缩进之后。它也在规则检查其他任何内容之前。现在看起来是这样的：

<INITIAL>^[^ \t] {
  printf("DEBUG: Expression ^[^ \\t] matches string: %s\n", yytext);

  /* Dedent to 0 if not mid-expression */
  if(!lineJoin && bracketDepth() == 0)
    changeIndent(0);

  /* Treat line as normal */
  REJECT;
}

据我了解，上面的规则应该为 lexed 文件中具有实际 python 代码但不以缩进开头的任何行输出调试行。然而，就目前情况而言，我的许多文本案例中很少有行显示它。

例如，调试输出在这个测试用例中没有出现（它也完全错过了第 4 行的缩进）：

myList = [1,2,3,4]
for index in range(len(myList)):
    myList[index] += 1
print( myList )

但是在这一行中的每一行都出现了：

a = 1 + 1
b = 2 % 3
c = 1 ^ 1
d = 1 - 1
f = 1 * 1
g = 1 / 1

鉴于大多数其他规则正常工作，我相信正则表达式是上述规则中的问题，但我不明白为什么这个规则大多数时候都会失败。有人有任何见解吗？

原文

I'm working on a lexer for the Python grammar (written in Flex) for a compiler construction class and I'm having trouble getting a properly working regular expression to catch when there is no white space at the beginning of a line (to account for the end of an indented block).

The rule checking for no indentation appears after those checking for comments, blank lines, and indentation. It is also before rules checking for anything else. Here's what it looks like right now:

<INITIAL>^[^ \t] {
  printf("DEBUG: Expression ^[^ \\t] matches string: %s\n", yytext);

  /* Dedent to 0 if not mid-expression */
  if(!lineJoin && bracketDepth() == 0)
    changeIndent(0);

  /* Treat line as normal */
  REJECT;
}

As I understand it, the rule above should output that debug line for any line in the lexed file that has actual python code but doesn't start with indentation. However, as it stands now, very few lines in my many text cases display it.

For example, the debug output appears nowhere for this test case (it also misses the dedent entirely on line 4):

myList = [1,2,3,4]
for index in range(len(myList)):
    myList[index] += 1
print( myList )

but appears for every line in this one:

a = 1 + 1
b = 2 % 3
c = 1 ^ 1
d = 1 - 1
f = 1 * 1
g = 1 / 1

Given that most of the other rules work properly, I'm led to believe that the regex is the problem in the above rule but I don't see why this one is failing most of the time. Does anyone have any insight?

分享到QQ

分享到微博