我必须在 Perl 预编译正则表达式中转义哪些字符?

发布于 2024-07-08 05:51:37 字数 832 浏览 8 评论 0原文

我很难确定在使用 Perl 的 qr{} 构造时必须转义哪些字符

我正在尝试为包含大量正常转义字符的文本创建多行预编译正则表达式( #*.>:[]) 并且还包含另一个预编译的正则表达式。 此外,出于测试目的,我需要尽可能严格地匹配。

my $output = q{# using defaults found in .config
*
*
Options:
  1. opt1
> 2. opt2
choice[1-2?]: };

my $sc = qr{(>|\s)}smx;
my $re = qr{# using defaults found in .config
*
*
Options:
$sc 1. opt1
$sc 2. opt2
choice[1-2?]: }mx;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}

错误:

Quantifier follows nothing in regex; marked by <-- HERE in m/# using defaults found in .config
* <-- HERE 
*
Options:
(?msx-i:(>|\s)) 1. opt1
(?msx-i:(>|\s)) 2. opt2
choice[1-2?]: / at ./so.pl line 14.

尝试转义星号会导致匹配失败(D'oh 输出)。 尝试转义其他讨厌的字符也会导致匹配失败。 我可以继续尝试不同的逃脱组合,但这里有很多变化,我希望有人能提供一些见解。

I'm having a hard time determining what characters must be escaped when using Perl's qr{} construct

I'm attempting to create a multi-line precompiled regex for text that contains a myriad of normally escaped characters (#*.>:[]) and also contains another precompiled regex. Additionally I need to match as strictly as possible for testing purposes.

my $output = q{# using defaults found in .config
*
*
Options:
  1. opt1
> 2. opt2
choice[1-2?]: };

my $sc = qr{(>|\s)}smx;
my $re = qr{# using defaults found in .config
*
*
Options:
$sc 1. opt1
$sc 2. opt2
choice[1-2?]: }mx;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}

Error:

Quantifier follows nothing in regex; marked by <-- HERE in m/# using defaults found in .config
* <-- HERE 
*
Options:
(?msx-i:(>|\s)) 1. opt1
(?msx-i:(>|\s)) 2. opt2
choice[1-2?]: / at ./so.pl line 14.

Attempting to escape the asterisks results in a failed match (D'oh output). Attempting to escape other pesky chars also results in a failed match. I could continue trying different combos of what to escape, but there's a lot of variations here and am hoping someone could provide some insight.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

夏有森光若流苏 2024-07-15 05:51:37

您必须转义 qr// 的分隔符,并且必须转义任何要用作文字的正则表达式元字符。 如果您希望它们是文字 * 的,则需要转义它们,因为 * 是正则表达式量词。

您的问题是您添加的各种正则表达式标志。 /m 不执行任何操作,因为您不使用字符串开头或结尾锚点(^、$)。 /s 不执行任何操作,因为您不使用通配符。 元字符。 /x 使正则表达式中的所有空格变得毫无意义,并将带有 # 的行转换为正则表达式注释。

这就是您想要的,删除了正则表达式标志并转义了正确的内容:

my $sc = qr{(>|\s)};

my $re = qr{# using defaults found in \.config
\*
\*
Options:
$sc 1\. opt1
$sc 2\. opt2
choice\[1-2\?]: };

尽管 Damian Conway 告诉 Perl 最佳实践 中的人们始终将这些选项放在他们的正则表达式中,但您现在明白为什么他错了。 仅当您想要它们做什么时才应添加它们,并且仅当您知道它们做什么时才应添加它们。 :) 如果您想使用 /x,您可以执行以下操作。 您必须转义任何文字空白,需要以某种方式表示行结尾,并且必须转义文字 # 字符。 以前可读的内容现在变得一团糟:

my $sc  = qr{(>|\s)};
my $eol = qr{[\r\n]+};

my $re  = qr{\# \s+ using \s+ defaults \s+ found \s+ in \s+ \.config $eol
\*                    $eol
\*                    $eol
Options:              $eol
$sc \s+ 1\. \s+ opt1   $eol
$sc \s+ 2\. \s+ opt2   $eol
choice\[1-2\?]: \s+
}x;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}

You have to escape the delimiter for qr//, and you have to escape any regex metacharacters that you want to use as literals. If you want those to be literal *'s, you need to escape them since the * is a regex quantifier.

Your problem here is the various regex flags that you've added. The /m doesn't do anything because you don't use the beginning- or end-of-string anchors (^, $). The /s doesn't do anything because you don't use the wildcard . metacharacter. The /x makes all of the whitespace in your regex meaningless, and it turns that line with the # into a regex comment.

This is what you want, with regex flags removed and the proper things escaped:

my $sc = qr{(>|\s)};

my $re = qr{# using defaults found in \.config
\*
\*
Options:
$sc 1\. opt1
$sc 2\. opt2
choice\[1-2\?]: };

Although Damian Conway tells people in Perl Best Practices to always put these options on their regexes, you now see why he's wrong. You should only add them when you want what they do, and you should only add things when you know what they do. :) Here's what you might do if you want to use /x. You have to escape any literal whitespace, you need to denote the line endings somehow, and you have to escape the literal # character. What was readable before is now a mess:

my $sc  = qr{(>|\s)};
my $eol = qr{[\r\n]+};

my $re  = qr{\# \s+ using \s+ defaults \s+ found \s+ in \s+ \.config $eol
\*                    $eol
\*                    $eol
Options:              $eol
$sc \s+ 1\. \s+ opt1   $eol
$sc \s+ 2\. \s+ opt2   $eol
choice\[1-2\?]: \s+
}x;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}
安静被遗忘 2024-07-15 05:51:37

听起来你真正想要的是期望,但你最直接的东西寻找的是 quotemeta 运算符,它转义对正则表达式具有特殊含义的所有字符。

但是,要直接回答您的问题,除了取消引号字符(在本例中 } )之外,您还需要至少转义 .[$()|*+?{\

Sounds like what you really want is Expect, but the thing you are most immediately looking for is the quotemeta operator which escapes all characters that have special meanings to a regex.

To answer your question directly (however), in addition to the unquote character (in this case }) you need to escape at a minimum, .[$()|*+?{\

拍不死你 2024-07-15 05:51:37

就像布莱恩所说,你必须转义分隔符和正则表达式元字符。 请注意,当使用 qr//x (您就是)时,您还必须转义空白字符和 # (这是注释标记)。 您可能实际上不想在这里使用 /x 。 如果您想安全,您可以转义任何非字母数字字符。

Like brian said, you must escape the delimiter and regex metacharacters. Note that when using qr//x (which you are), you must also escape whitespace characters and # (which is a comment marker). You probably don't actually want to use /x here. If you want to be safe, you can escape any non-alphanumeric character.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文