如何在 Perl 中获得正则表达式模式的完美匹配?

发布于 2024-12-09 09:47:34 字数 1324 浏览 1 评论 0 原文

我必须匹配存储在变量中的正则表达式:

#!/bin/env perl

use warnings;
use strict;
my $expr = qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx;
$str = "abcd[3] xyzg[4:0]";
if ($str =~ m/$expr/) {
    print "\n%%%%%%%%% $`-----$&-----$'\n";
}
else {
    print "\n********* NOT MATCHED\n";
}

但是我在 $& 中得到输出: as

%%%%%%%%% -----abcd[3] xyzg-----[4:0]

但期望它不应该出现在 if 子句中。 意图是:

if $str = "abcd xyzg" => %%%%%%%%% -----abcd xyzg-----            (CORRECT)
if $str = "abcd[2] xyzg" => %%%%%%%%% -----abcd[2] xyzg-----      (CORRECT)
if $str = "abcd[2] xyzg[3] => %%%%%%%%% -----abcd[2] xyzg[3]----- (CORRECT)
if $str = "abcd[2:0] xyzg[3] => ********* NOT MATCHED             (CORRECT)
if $str = "abcd[2:0] xyzg[3:0] => ********* NOT MATCHED           (CORRECT)
if $str = "abcd[2] xyzg[3:0]" => ********* NOT MATCHED            (CORRECT/INTENDED)

但输出是%%%%%%%%% -----abcd[2] xyzg-----[3:0](错误) 或者更好地说这不是有意的。 在这种情况下,它应该 /my_expectation 转到 else 块。 即使我不知道,为什么 $& 取字符串的一部分(abcd[2] xyzg),并且 $'[3:0]如何? 它应该匹配完整的内容,而不是像上面这样的部分内容。如果没有,则不应转到 if 子句。

谁能帮我改变我的 $expr 模式,以便我可以得到预期的结果?

I've to match a regular-expression, stored in a variable:

#!/bin/env perl

use warnings;
use strict;
my $expr = qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx;
$str = "abcd[3] xyzg[4:0]";
if ($str =~ m/$expr/) {
    print "\n%%%%%%%%% 

I've to match a regular-expression, stored in a variable:

-----
amp;-----

But I'm getting the outout in $& as

%%%%%%%%% -----abcd[3] xyzg-----[4:0]

But expecting, it shouldn't go inside the if clause.
What is intended is:

if $str = "abcd xyzg" => %%%%%%%%% -----abcd xyzg-----            (CORRECT)
if $str = "abcd[2] xyzg" => %%%%%%%%% -----abcd[2] xyzg-----      (CORRECT)
if $str = "abcd[2] xyzg[3] => %%%%%%%%% -----abcd[2] xyzg[3]----- (CORRECT)
if $str = "abcd[2:0] xyzg[3] => ********* NOT MATCHED             (CORRECT)
if $str = "abcd[2:0] xyzg[3:0] => ********* NOT MATCHED           (CORRECT)
if $str = "abcd[2] xyzg[3:0]" => ********* NOT MATCHED            (CORRECT/INTENDED)

but output is %%%%%%%%% -----abcd[2] xyzg-----[3:0] (WRONG)
OR better to say this is not intended.
In this case, it should/my_expectation go to the else block.
Even I don't know, why $& take a portion of the string (abcd[2] xyzg), and $' having [3:0]?
HOW?
It should match the full, not a part like the above. If it didn't, it shouldn't go to the if clause.

Can anyone please help me to change my $expr pattern, so that I can have what is intended?

\n"; } else { print "\n********* NOT MATCHED\n"; }

But I'm getting the outout in $& as


But expecting, it shouldn't go inside the if clause.
What is intended is:


but output is %%%%%%%%% -----abcd[2] xyzg-----[3:0] (WRONG)
OR better to say this is not intended.
In this case, it should/my_expectation go to the else block.
Even I don't know, why $& take a portion of the string (abcd[2] xyzg), and $' having [3:0]?
HOW?
It should match the full, not a part like the above. If it didn't, it shouldn't go to the if clause.

Can anyone please help me to change my $expr pattern, so that I can have what is intended?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

转身泪倾城 2024-12-16 09:47:34

默认情况下,Perl 正则表达式仅查找给定字符串的匹配子字符串。为了强制与整个字符串进行比较,您需要使用 ^$ 指示正则表达式从字符串的开头开始并在末尾结束

my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)$/;

:(此外,没有理由使用 /x 修饰符,因为您的正则表达式不包含任何文字空白或 # 字符,并且没有理由使用 / s 修饰符,因为你不是使用 ..)

编辑:如果您不希望正则表达式与整个字符串匹配,但希望它拒绝匹配部分后跟类似“[0:0”的内容]”,最简单的方法是使用前瞻:

my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\]|(?=[^[\w])|$ ))/x;

这将匹配采用以下形式的任何内容:

  • 字符串的开头(注释中的示例似乎暗示您想要)
  • 零个或多个空白字符
  • 一个或多个单词字符
  • 可选: [, 一个或更多数字,]
  • 一个或多个空白字符
  • 一个或多个单词字符
  • 以下之一,按优先级降序排列:
      • [,一位或多位数字,]
      • 一个空字符串,后跟(但不包括!)一个既不是 [ 也不是单词字符的字符(排除单词字符是为了防止正则表达式引擎在“”上成功a[0] bc[1:2]”,仅匹配“a[0] b”。)
      • 字符串结尾($ 后需要一个空格,以防止其与后面的 合并) 形成特殊变量的名称,这需要重新引入 /x 选项。)

您还有其他未阐明的要求需要满足吗?

By default, Perl regexes only look for a matching substring of the given string. In order to force comparison against the entire string, you need to indicate that the regex begins at the beginning of the string and ends at the end by using ^ and $:

my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)$/;

(Also, there's no reason to have the /x modifier, as your regex doesn't include any literal whitespace or # characters, and there's no reason for the /s modifier, as you're not using ..)

EDIT: If you don't want the regex to match against the entire string, but you want it to reject anything in which the matching portion is followed by something like "[0:0]", the simplest way would be to use lookahead:

my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\]|(?=[^[\w])|$ ))/x;

This will match anything that takes the following form:

  • beginning of the string (which your example in the comments seems to imply you want)
  • zero or more whitespace characters
  • one or more word characters
  • optional: [, one or more digits, ]
  • one or more whitespace characters
  • one or more word characters
  • one of the following, in descending order of preference:
      • [, one or more digits, ]
      • an empty string followed by (but not including!) a character that is neither [ nor a word character (The exclusion of word characters is to keep the regex engine from succeeding on "a[0] bc[1:2]" by only matching "a[0] b".)
      • end of string (A space is needed after the $ to keep it from merging with the following ) to form the name of a special variable, and this entails the reintroduction of the /x option.)

Do you have any more unstated requirements that need to be satisfied?

[浮城] 2024-12-16 09:47:34

简短的回答是你的正则表达式是错误的。
如果您不准确解释您需要什么,我们就无法为您修复它,并且社区不会完全出于您的目的编写正则表达式,因为这只是一个过于本地化的问题,只能帮助您一次。

您需要询问一些关于正则表达式的更一般的问题,我们可以向您解释,这将帮助您修复您的正则表达式,并帮助其他人修复他们的正则表达式。

当您在测试正则表达式时遇到问题时,这是我的一般答案。使用正则表达式工具,例如 regex buddy 工具。

因此,我将针对您在这里忽略的内容给出具体答案:
让我们把这个例子缩小一点:
您的模式是a(bc+d)?。它将匹配: abcd abccd 等。但在 bcdbzd >abzd 它将只匹配 a,因为整个 bc+d 组是可选的。同样,它将把 abcbcd 匹配为 a,删除无法匹配的整个可选组(在第二个 b 处)。

正则表达式将尽可能多地匹配字符串,并在可以匹配某些内容并满足整个模式时返回真正的匹配。如果您将某些内容设置为可选,那么只有当它存在并且匹配时,他们才会在必须包含它时将其省略。

这是您尝试过的:
qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx
首先,这里不需要 sx 修饰符。
其次,这个正则表达式可以匹配:
后面有空格或没有空格
至少有一个字母字符的单词,后跟
可选的包含至少一位数字(例如 [0] 或 [9999])的分组方括号数字,后跟
至少有一个空格,后跟
至少有一个字母字符的单词,后跟
可选择包含至少一位数字的方括号数字。

显然,当您要求它匹配 abcd[0] xyzg[0:4] 时,冒号结束 \d+ 模式,但不满足 \]< /code> 因此它回溯整个组,然后高兴地发现该组是可选的。因此,通过不匹配最后一个可选组,您的模式已成功匹配。

The short answer is your regexp is wrong.
We can't fix it for you without you explaining what you need exactly, and the community is not going to write a regexp exactly for your purpose because that's just too localized a question that only helps you this one time.

You need to ask something more general about regexps that we can explain to you, that will help you fix your regexp, and help others fix theirs.

Here's my general answer when you're having trouble testing your regexp. Use a regexp tool, like the regex buddy one.

So I'm going to give a specific answer about what you're overlooking here:
Let's make this example smaller:
Your pattern is a(bc+d)?. It will match: abcd abccd etc. While it will not match bcd nor bzd in the case of abzd it will match as matching only a because the whole group of bc+d is optional. Similarly it will match abcbcd as a dropping the whole optional group that couldn't be matched (at the second b).

Regexps will match as much of the string as they can and return a true match when they can match something and have satisfied the entire pattern. If you make something optional, they will leave it out when they have to including it only when it's present and matches.

Here's what you tried:
qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx
First, s and x aren't needed modifiers here.
Second, this regex can match:
Any or no whitespace followed by
a word of at least one alpha character followed by
optionally a grouped square bracketed number with at least one digit (eg [0] or [9999]) followed by
at least one white space followed by
a word of at least one alpha character followed by
optionally a square bracketed number with at least one digit.

Clearly when you ask it to match abcd[0] xyzg[0:4] the colon ends the \d+ pattern but doesn't satisfy the \] so it backtracks the whole group, and then happily finds the group was optional. So by not matching the last optional group, your pattern has matched successfully.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文