正则表达式在匹配字符串时如何忽略转义引号?

发布于 2024-07-26 12:52:38 字数 316 浏览 4 评论 0原文

我正在尝试编写一个正则表达式,它将匹配除尚未转义的撇号之外的所有内容。 考虑以下事项:

<?php $s = 'Hi everyone, we\'re ready now.'; ?>

我的目标是编写一个基本上匹配其字符串部分的正则表达式。 我正在考虑一些事情,比如

/.*'([^']).*/

为了匹配一个简单的字符串,但我一直在试图弄清楚如何在撇号上进行负向后查找,以确保它前面没有反斜杠......

任何想法?

- JMT

I'm trying to write a regex that will match everything BUT an apostrophe that has not been escaped. Consider the following:

<?php $s = 'Hi everyone, we\'re ready now.'; ?>

My goal is to write a regular expression that will essentially match the string portion of that. I'm thinking of something such as

/.*'([^']).*/

in order to match a simple string, but I've been trying to figure out how to get a negative lookbehind working on that apostrophe to ensure that it is not preceded by a backslash...

Any ideas?

- JMT

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

爱殇璃 2024-08-02 12:52:38

这是我的测试用例解决方案:

/.*?'((?:\\\\|\\'|[^'])*+)'/

和我的(Perl,但我不使用任何我认为不特定的 Perl 功能)证明:

use strict;
use warnings;

my %tests = ();
$tests{'Case 1'} = <<'EOF';
$var = 'My string';
EOF

$tests{'Case 2'} = <<'EOF';
$var = 'My string has it\'s challenges';
EOF

$tests{'Case 3'} = <<'EOF';
$var = 'My string ends with a backslash\\';
EOF

foreach my $key (sort (keys %tests)) {
    print "$key...\n";
    if ($tests{$key} =~ m/.*?'((?:\\\\|\\'|[^'])*+)'/) {
        print " ... '$1'\n";
    } else {
        print " ... NO MATCH\n";
    }
}

运行此显示:

$ perl a.pl
Case 1...
 ... 'My string'
Case 2...
 ... 'My string has it\'s challenges'
Case 3...
 ... 'My string ends with a backslash\\'

请注意,开始时的初始通配符需要是非贪婪的。 然后我使用非回溯匹配来吞噬 \\ 和 \' 以及其他任何不是独立引号字符的内容。

我认为这可能模仿编译器的内置方法,这应该使它非常防弹。

Here's my solution with test cases:

/.*?'((?:\\\\|\\'|[^'])*+)'/

And my (Perl, but I don't use any Perl-specific features I don't think) proof:

use strict;
use warnings;

my %tests = ();
$tests{'Case 1'} = <<'EOF';
$var = 'My string';
EOF

$tests{'Case 2'} = <<'EOF';
$var = 'My string has it\'s challenges';
EOF

$tests{'Case 3'} = <<'EOF';
$var = 'My string ends with a backslash\\';
EOF

foreach my $key (sort (keys %tests)) {
    print "$key...\n";
    if ($tests{$key} =~ m/.*?'((?:\\\\|\\'|[^'])*+)'/) {
        print " ... '$1'\n";
    } else {
        print " ... NO MATCH\n";
    }
}

Running this shows:

$ perl a.pl
Case 1...
 ... 'My string'
Case 2...
 ... 'My string has it\'s challenges'
Case 3...
 ... 'My string ends with a backslash\\'

Note that the initial wildcard at the start needs to be non-greedy. Then I use non-backtracking matches to gobble up \\ and \' and then anything else that is not a standalone quote character.

I think this one probably mimics the compiler's built-in approach, which should make it pretty bullet-proof.

赴月观长安 2024-08-02 12:52:38
<?php
$backslash = '\\';

$pattern = <<< PATTERN
#(["'])(?:{$backslash}{$backslash}?+.)*?{$backslash}1#
PATTERN;

foreach(array(
    "<?php \$s = 'Hi everyone, we\\'re ready now.'; ?>",
    '<?php $s = "Hi everyone, we\\"re ready now."; ?>',
    "xyz'a\\'bc\\d'123",
    "x = 'My string ends with with a backslash\\\\';"
    ) as $subject) {
        preg_match($pattern, $subject, $matches);
        echo $subject , ' => ', $matches[0], "\n\n";
}

印刷

<?php $s = 'Hi everyone, we\'re ready now.'; ?> => 'Hi everyone, we\'re ready now.'

<?php $s = "Hi everyone, we\"re ready now."; ?> => "Hi everyone, we\"re ready now."

xyz'a\'bc\d'123 => 'a\'bc\d'

x = 'My string ends with with a backslash\\'; => 'My string ends with with a backslash\\'
<?php
$backslash = '\\';

$pattern = <<< PATTERN
#(["'])(?:{$backslash}{$backslash}?+.)*?{$backslash}1#
PATTERN;

foreach(array(
    "<?php \$s = 'Hi everyone, we\\'re ready now.'; ?>",
    '<?php $s = "Hi everyone, we\\"re ready now."; ?>',
    "xyz'a\\'bc\\d'123",
    "x = 'My string ends with with a backslash\\\\';"
    ) as $subject) {
        preg_match($pattern, $subject, $matches);
        echo $subject , ' => ', $matches[0], "\n\n";
}

prints

<?php $s = 'Hi everyone, we\'re ready now.'; ?> => 'Hi everyone, we\'re ready now.'

<?php $s = "Hi everyone, we\"re ready now."; ?> => "Hi everyone, we\"re ready now."

xyz'a\'bc\d'123 => 'a\'bc\d'

x = 'My string ends with with a backslash\\'; => 'My string ends with with a backslash\\'
花开半夏魅人心 2024-08-02 12:52:38
/.*'([^'\\]|\\.)*'.*/

括号部分查找非撇号/反斜杠和反斜杠转义字符。 如果只能转义某些字符,请将 \\. 更改为 \\['\\az] 或其他内容。

/.*'([^'\\]|\\.)*'.*/

The parenthesized portion looks for non-apostrophes/backslashes and backslash-escaped characters. If only certain characters can be escaped change the \\. to \\['\\a-z], or whatever.

遥远的她 2024-08-02 12:52:38
Regex reg = new Regex("(?<!\\\\)'(?<string>.*?)(?<!\\\\)'");
Regex reg = new Regex("(?<!\\\\)'(?<string>.*?)(?<!\\\\)'");
避讳 2024-08-02 12:52:38

这是针对 JavaScript 的:

/('|")(?:\\\\|\\\1|[\s\S])*?\1/

it...

  • 匹配单个或双引号字符串
  • 匹配空字符串(长度为 0)
  • 匹配嵌入空白的字符串(\n\t 等)
  • 跳过内部转义引号(单引号或双引号)
  • 跳过单引号双引号内的引号,反之亦然

您可以使用以下命令捕获 $2 中未加引号的字符串:

/('|")((?:\\\\|\\\1|[\s。 \S])*?)\1/

This is for JavaScript:

/('|")(?:\\\\|\\\1|[\s\S])*?\1/

it...

  • matches single or double quoted strings
  • matches empty strings (length 0)
  • matches strings with embedded whitespace (\n, \t, etc.)
  • skips inner escaped quotes (single or double)
  • skips single quotes within double quotes and vice versa

Only the first quote is captured. You can capture the unquoted string in $2 with:

/('|")((?:\\\\|\\\1|[\s\S])*?)\1/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文