为什么我的 ack 正则表达式会得到额外的、意想不到的结果?

发布于 2024-08-29 01:30:26 字数 2100 浏览 9 评论 0原文

我终于开始学习正则表达式并使用 ack 进行训练。我相信这使用 Perl 正则表达式。

我想匹配第一个非空白字符为 if (! 的所有行,元素之间有任意数量的空格。

这就是我想到的:

^[ \t]*if *\(\w+ *!

它几乎^[ \t]* 是错误的,因为它匹配一个或不匹配[空格或制表符]。 我想要的是匹配任何可能只包含空格或制表符(或不包含任何内容)的内容。

例如,这些不应该匹配:

// if (asdf != 0)
else if (asdf != 1)

How can I edit my regexp for that?


编辑添加命令行

ack -i --group -a '^\s*if *\(\w+ *!' c:/work/proj/proj 

注意单引号,我不再那么确定它们了。

我的搜索库是一个更大的代码库。它确实包含匹配表达式(相当多),但即使是例如:

274:                }else if (y != 0) 

,我通过上述命令得到的结果。


编辑添加mobrule测试的结果

Mobrule,感谢您为我提供了测试文本。我将在此处复制我在提示符中得到的内容:

C:\Temp\regex>more ack.test
# ack.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
    if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
        if (asdf != 0) # multiple tab - ok
    if (asdf != 0) # spaces + tab ok
     if (asdf != 0) # tab + space ok
     if (asdf != 0) # space + tab + space ok
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok

C:\Temp\regex>ack '^[ \t]*if *\(\w+ *!' ack.test

C:\Temp\regex>"C:\Program\git\bin\perl.exe" C:\bat\ack.pl '[ \t]*if *\(\w+ *!' a
ck.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
    if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
        if (asdf != 0) # multiple tab - ok
    if (asdf != 0) # spaces + tab ok
     if (asdf != 0) # tab + space ok
     if (asdf != 0) # space + tab + space ok
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok

问题出在我对 ack.bat 的调用中!

ack.bat 包含:

"C:\Program\git\bin\perl.exe" C:\bat\ack.pl %*

虽然我用插入符号调用,但它在调用 bat 文件时就消失了!

使用 ^^ 转义插入符号不起作用。

使用 " " 而不是 ' ' 引用正则表达式是有效的。我的问题是 DOS/win 问题,很抱歉打扰大家。

I'm finally learning regexps and training with ack. I believe this uses Perl regexp.

I want to match all lines where the first non-blank characters are if (<word> !, with any number of spaces in between the elements.

This is what I came up with:

^[ \t]*if *\(\w+ *!

It only nearly worked. ^[ \t]* is wrong, since it matches one or none [space or tab].
What I want is to match anything that may contain only space or tab (or nothing).

For example these should not match:

// if (asdf != 0)
else if (asdf != 1)

How can I modify my regexp for that?


EDIT adding command line

ack -i --group -a '^\s*if *\(\w+ *!' c:/work/proj/proj 

Note the single quotes, I'm not so sure about them anymore.

My search base is a larger code base. It does include matching expressions (quite some), but even for example:

274:                }else if (y != 0) 

, which I get as a result of the above command.


EDIT adding the result of mobrule's test

Mobrule, thanks for providing me a text to test on. I'll copy here what I get on my prompt:

C:\Temp\regex>more ack.test
# ack.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
    if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
        if (asdf != 0) # multiple tab - ok
    if (asdf != 0) # spaces + tab ok
     if (asdf != 0) # tab + space ok
     if (asdf != 0) # space + tab + space ok
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok

C:\Temp\regex>ack '^[ \t]*if *\(\w+ *!' ack.test

C:\Temp\regex>"C:\Program\git\bin\perl.exe" C:\bat\ack.pl '[ \t]*if *\(\w+ *!' a
ck.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
    if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
        if (asdf != 0) # multiple tab - ok
    if (asdf != 0) # spaces + tab ok
     if (asdf != 0) # tab + space ok
     if (asdf != 0) # space + tab + space ok
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok

The problem is in my call to my ack.bat!

ack.bat contains:

"C:\Program\git\bin\perl.exe" C:\bat\ack.pl %*

Although I call with a caret, it gets away at the call of the bat file!

Escaping the caret with ^^ does not work.

Quoting the regex with " " instead of ' ' works. My problem was a DOS/win problem, sorry for bothering you all for that.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

虐人心 2024-09-05 01:30:26
^\s*if\s*\(\S+\s*!
  • 对于非空白,请使用 \S\w 不会匹配任何特殊字符,因此 if ($word 不会匹配。可能符合您的规范,在这种情况下 \w(字母数字加“_”
    )就可以了
$ perl5.8 -e '{$s="else if (asdf \!= 1)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'
NO MATCH
$ perl5.8 -e '{$s="// if (asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'
NO MATCH
$ perl5.8 -e '{$s=" if (asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'  
|asdf|
$ perl5.8 -e '{$s="if (asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}' 
|asdf|
$ perl5.8 -e '{$s="if (\$asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'
|$asdf|
^\s*if\s*\(\S+\s*!
  • Use \S for non-white-space. \w will not match any special chars, so if ($word will not match. May be that's OK with your specs, in which case \w (alphanumeric plus "_"
    ) is OK
$ perl5.8 -e '{$s="else if (asdf \!= 1)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'
NO MATCH
$ perl5.8 -e '{$s="// if (asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'
NO MATCH
$ perl5.8 -e '{$s=" if (asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'  
|asdf|
$ perl5.8 -e '{$s="if (asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}' 
|asdf|
$ perl5.8 -e '{$s="if (\$asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'
|$asdf|
煮茶煮酒煮时光 2024-09-05 01:30:26

ackgrep 中,* 匹配零个或多个,而不是零或一个。所以我认为你已经有了正确的解决方案。哪些测试用例没有给您想要的结果?

# ack.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
    if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
        if (asdf != 0) # multiple tab - ok
    if (asdf != 0) # spaces + tab ok
     if (asdf != 0) # tab + space ok
     if (asdf != 0) # space + tab + space ok
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok

结果:

$ ack '^[ \t]*if *\(\w+ *!' ack.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
        if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
                if (asdf != 0) # multiple tab - ok
        if (asdf != 0) # spaces + tab ok
         if (asdf != 0) # tab + space ok
         if (asdf != 0) # space + tab + space ok

$ ack -v '^[ \t]*if *\(\w+ *!' ack.test
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok

In both ack and grep, * matches zero or more, not zero or one. So I think you already have the right solution. What test cases aren't giving you the results you want?

# ack.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
    if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
        if (asdf != 0) # multiple tab - ok
    if (asdf != 0) # spaces + tab ok
     if (asdf != 0) # tab + space ok
     if (asdf != 0) # space + tab + space ok
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok

Results:

$ ack '^[ \t]*if *\(\w+ *!' ack.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
        if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
                if (asdf != 0) # multiple tab - ok
        if (asdf != 0) # spaces + tab ok
         if (asdf != 0) # tab + space ok
         if (asdf != 0) # space + tab + space ok

$ ack -v '^[ \t]*if *\(\w+ *!' ack.test
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok
卷耳 2024-09-05 01:30:26

您可以尝试:

(?:\t*| *)if *\(\w+ *!

.

\t*| *

将是零个或多个制表符或零个或多个空格,而不是空格和制表符的混合。

You can try:

(?:\t*| *)if *\(\w+ *!

.

\t*| *

will be zero or more tabs or zero or more spaces not a mix of spaces and tabs.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文