如何忽略 perl grep 中的任何空值?
我使用以下方法来计算文件中某个模式的出现次数:
my @lines = grep /$text/, <$fp>;
print ($#lines + 1);
但有时它打印的结果比实际值多 1。我查了一下,是因为@lines
最后一个元素为null,所以也算进去了。
为什么有时 grep 结果的最后一个元素为空?另外,这个问题如何解决?
I am using the following to count the number of occurrences of a pattern in a file:
my @lines = grep /$text/, <$fp>;
print ($#lines + 1);
But sometimes it prints one more than the actual value. I checked and it is because the last element of @lines
is null, and that is also counted.
How can the last element of the grep result be empty sometimes? Also, how can this issue be resolved?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这实际上很大程度上取决于您的模式,但您可以做的一件事是加入几个匹配项,第一个匹配项取消任何仅包含空格(或不包含空格)的行。此示例将拒绝任何空行、仅换行符或仅包含任意数量的空格的行。
请记住,如果 $test 的内容碰巧包含 regexp 特殊元字符,则它们要么需要用于其元字符目的,要么使用
quotemeta()
进行消毒。我的理论是,您可能有一个以 \n 结尾的行,它以某种方式与您的 $text 正则表达式匹配,或者您的 $text 正则表达式包含其中的元字符,这些元字符在您不知情的情况下影响匹配。无论哪种方式,我提供的代码片段至少会强制拒绝“空白行”,其中空白可能意味着完全空(不太可能)、换行符终止但否则为空(可能),或者包含空白行(可能)在打印时显示为空白。
It really depends a lot on your pattern, but one thing you could do is join a couple of matches, the first one disqualifying any line that contains only space (or nothing). This example will reject any line that is either empty, newline only, or any amount of whitespace only.
Keep in mind that if the contents of $test happen to include regexp special metacharacters they either need to be intended for their metacharacter purposes, or sterilized with
quotemeta()
.My theories are that you might have a line terminated in \n which is somehow matching your $text regexp, or your $text regexp contains metacharacters in it that are affecting the match without you being aware. Either way, the snippet I provided will at least force rejection of "blank lines", where blank could mean completely empty (unlikely), newline terminated but otherwise empty (probable), or whitespace containing (possible) lines that appear blank when printed.
匹配空字符串的正则表达式将匹配
undef
。 Perl 会警告这样做,但在尝试匹配之前将undef
转换为''
,此时grep
会很高兴地促进undef
到其结果。如果您不想选取空字符串(或任何将像空字符串一样进行匹配的内容),则需要重写正则表达式以使其不匹配。A regular expression that matches the empty string will match
undef
. Perl will warn about doing so, but castsundef
to''
before trying to match against it, at which pointgrep
will quite happily promote theundef
to its results. If you don't want to pick up the empty string (or anything that will be matched as though it were the empty string), you need to rewrite your regular expression to not match it.要准确查看行中的内容,请执行以下操作:
To accurately see what is in lines, do:
好吧,由于没有更多关于
$text
(正则表达式)内容的信息,我想我会抛出一些一般信息。考虑以下示例:
我们得到:
所有值都匹配。为什么?因为它们也匹配空字符串。为了得到我们想要的,我们需要
\s
或\s+
。 (两者不会有实际区别)你可能会遇到这样的问题。
Ok, since no more information about the contents of
$text
(the regex) is forthcoming, I guess I'll toss out some general information.Consider the following example:
We get:
All the values match. Why? Because they also match the empty string. To get what we want, we need
\s
or\s+
. (There will be no practical difference between the two)You may have such a problem.