如何在 Perl 中获得正则表达式模式的完美匹配？

发布于 2024-12-09 09:47:34 字数 1324 浏览 1 评论 0 原文

我必须匹配存储在变量中的正则表达式：

#!/bin/env perl

use warnings;
use strict;
my $expr = qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx;
$str = "abcd[3] xyzg[4:0]";
if ($str =~ m/$expr/) {
    print "\n%%%%%%%%% $`-----$&-----$'\n";
}
else {
    print "\n********* NOT MATCHED\n";
}

但是我在 $& 中得到输出： as

%%%%%%%%% -----abcd[3] xyzg-----[4:0]

但期望它不应该出现在 if 子句中。 意图是：

if $str = "abcd xyzg" => %%%%%%%%% -----abcd xyzg-----            (CORRECT)
if $str = "abcd[2] xyzg" => %%%%%%%%% -----abcd[2] xyzg-----      (CORRECT)
if $str = "abcd[2] xyzg[3] => %%%%%%%%% -----abcd[2] xyzg[3]----- (CORRECT)
if $str = "abcd[2:0] xyzg[3] => ********* NOT MATCHED             (CORRECT)
if $str = "abcd[2:0] xyzg[3:0] => ********* NOT MATCHED           (CORRECT)
if $str = "abcd[2] xyzg[3:0]" => ********* NOT MATCHED            (CORRECT/INTENDED)

但输出是%%%%%%%%% -----abcd[2] xyzg-----[3:0]（错误） 或者更好地说这不是有意的。在这种情况下，它应该 /my_expectation 转到 else 块。即使我不知道，为什么 $& 取字符串的一部分（abcd[2] xyzg），并且 $' 有[3:0]？如何？它应该匹配完整的内容，而不是像上面这样的部分内容。如果没有，则不应转到 if 子句。

谁能帮我改变我的 $expr 模式，以便我可以得到预期的结果？

原文

I've to match a regular-expression, stored in a variable:

#!/bin/env perl

use warnings;
use strict;
my $expr = qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx;
$str = "abcd[3] xyzg[4:0]";
if ($str =~ m/$expr/) {
    print "\n%%%%%%%%% I've to match a regular-expression, stored in a variable:
-----amp;-----
But I'm getting the outout in $& as
%%%%%%%%% -----abcd[3] xyzg-----[4:0]

But expecting, it shouldn't go inside the if clause.

What is intended is:
if $str = "abcd xyzg" => %%%%%%%%% -----abcd xyzg-----            (CORRECT)
if $str = "abcd[2] xyzg" => %%%%%%%%% -----abcd[2] xyzg-----      (CORRECT)
if $str = "abcd[2] xyzg[3] => %%%%%%%%% -----abcd[2] xyzg[3]----- (CORRECT)
if $str = "abcd[2:0] xyzg[3] => ********* NOT MATCHED             (CORRECT)
if $str = "abcd[2:0] xyzg[3:0] => ********* NOT MATCHED           (CORRECT)
if $str = "abcd[2] xyzg[3:0]" => ********* NOT MATCHED            (CORRECT/INTENDED)

but output is %%%%%%%%% -----abcd[2] xyzg-----[3:0]               (WRONG)

OR better to say this is not intended.

In this case, it should/my_expectation go to the else block.

Even I don't know, why $& take a portion of the string (abcd[2] xyzg), and $' having [3:0]?

HOW?

It should match the full, not a part like the above. If it didn't, it shouldn't go to the if clause.
Can anyone please help me to change my $expr pattern, so that I can have what is intended?
\n";
}
else {
    print "\n********* NOT MATCHED\n";
}

But I'm getting the outout in $& as

But expecting, it shouldn't go inside the if clause.
What is intended is:

but output is %%%%%%%%% -----abcd[2] xyzg-----[3:0] (WRONG)
OR better to say this is not intended.
In this case, it should/my_expectation go to the else block.
Even I don't know, why $& take a portion of the string (abcd[2] xyzg), and $' having [3:0]?
HOW?
It should match the full, not a part like the above. If it didn't, it shouldn't go to the if clause.

Can anyone please help me to change my $expr pattern, so that I can have what is intended?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

转身泪倾城 2024-12-16 09:47:34

默认情况下，Perl 正则表达式仅查找给定字符串的匹配子字符串。为了强制与整个字符串进行比较，您需要使用 ^ 和 $ 指示正则表达式从字符串的开头开始并在末尾结束

my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)$/;

：(此外，没有理由使用 /x 修饰符，因为您的正则表达式不包含任何文字空白或 # 字符，并且没有理由使用 / s 修饰符，因为你不是使用 ..)

编辑：如果您不希望正则表达式与整个字符串匹配，但希望它拒绝匹配部分后跟类似“[0:0”的内容]”，最简单的方法是使用前瞻：

my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\]|(?=[^[\w])|$ ))/x;

这将匹配采用以下形式的任何内容：

字符串的开头（注释中的示例似乎暗示您想要）
零个或多个空白字符
一个或多个单词字符
可选: [, 一个或更多数字，]
一个或多个空白字符
一个或多个单词字符
以下之一，按优先级降序排列：
- - [，一位或多位数字，]
- - 一个空字符串，后跟（但不包括！）一个既不是 [ 也不是单词字符的字符（排除单词字符是为了防止正则表达式引擎在“”上成功a[0] bc[1:2]”，仅匹配“a[0] b”。）
- - 字符串结尾（$ 后需要一个空格，以防止其与后面的 合并） 形成特殊变量的名称，这需要重新引入 /x 选项。）

您还有其他未阐明的要求需要满足吗？

By default, Perl regexes only look for a matching substring of the given string. In order to force comparison against the entire string, you need to indicate that the regex begins at the beginning of the string and ends at the end by using ^ and $:

my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)$/;

(Also, there's no reason to have the /x modifier, as your regex doesn't include any literal whitespace or # characters, and there's no reason for the /s modifier, as you're not using ..)

EDIT: If you don't want the regex to match against the entire string, but you want it to reject anything in which the matching portion is followed by something like "[0:0]", the simplest way would be to use lookahead:

my $expr = qr/^\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\]|(?=[^[\w])|$ ))/x;

This will match anything that takes the following form:

beginning of the string (which your example in the comments seems to imply you want)
zero or more whitespace characters
one or more word characters
optional: [, one or more digits, ]
one or more whitespace characters
one or more word characters
one of the following, in descending order of preference:
- - [, one or more digits, ]
- - an empty string followed by (but not including!) a character that is neither [ nor a word character (The exclusion of word characters is to keep the regex engine from succeeding on "a[0] bc[1:2]" by only matching "a[0] b".)
- - end of string (A space is needed after the $ to keep it from merging with the following ) to form the name of a special variable, and this entails the reintroduction of the /x option.)

Do you have any more unstated requirements that need to be satisfied?

回复收藏 0 原文

[浮城] 2024-12-16 09:47:34

简短的回答是你的正则表达式是错误的。
如果您不准确解释您需要什么，我们就无法为您修复它，并且社区不会完全出于您的目的编写正则表达式，因为这只是一个过于本地化的问题，只能帮助您一次。

您需要询问一些关于正则表达式的更一般的问题，我们可以向您解释，这将帮助您修复您的正则表达式，并帮助其他人修复他们的正则表达式。

当您在测试正则表达式时遇到问题时，这是我的一般答案。使用正则表达式工具，例如 regex buddy 工具。

因此，我将针对您在这里忽略的内容给出具体答案：
让我们把这个例子缩小一点：
您的模式是a(bc+d)?。它将匹配： abcd abccd 等。但在 bcd 或 bzd >abzd 它将只匹配 a，因为整个 bc+d 组是可选的。同样，它将把 abcbcd 匹配为 a，删除无法匹配的整个可选组（在第二个 b 处）。

正则表达式将尽可能多地匹配字符串，并在可以匹配某些内容并满足整个模式时返回真正的匹配。如果您将某些内容设置为可选，那么只有当它存在并且匹配时，他们才会在必须包含它时将其省略。

这是您尝试过的：
qr/\s*(\w+(\[\d+\])?)\s+(\w+(\[\d+\])?)/sx
首先，这里不需要 s 和 x 修饰符。
其次，这个正则表达式可以匹配：
后面有空格或没有空格
至少有一个字母字符的单词，后跟
可选的包含至少一位数字（例如 [0] 或 [9999]）的分组方括号数字，后跟
至少有一个空格，后跟
至少有一个字母字符的单词，后跟
可选择包含至少一位数字的方括号数字。

显然，当您要求它匹配 abcd[0] xyzg[0:4] 时，冒号结束 \d+ 模式，但不满足 \]< /code> 因此它回溯整个组，然后高兴地发现该组是可选的。因此，通过不匹配最后一个可选组，您的模式已成功匹配。