当前位置：文江博客话题详情

regex nested finite-automata

可以使用正则表达式来匹配嵌套模式吗？

发布于 2024-07-05 08:19:15 字数 295 浏览 8 评论 0 原文

是否可以编写一个正则表达式来匹配出现次数未知的嵌套模式？例如，当外大括号内嵌套未知数量的左大括号时，正则表达式是否可以匹配左大括号和右大括号？

例如：

public MyMethod()
{
  if (test)
  {
    // More { }
  }

  // More { }
} // End

应匹配：

{
  if (test)
  {
    // More { }
  }

  // More { }
}

原文

Is it possible to write a regular expression that matches a nested pattern that occurs an unknown number of times? For example, can a regular expression match an opening and closing brace when there are an unknown number of open/close braces nested within the outer braces?

For example:

public MyMethod()
{
  if (test)
  {
    // More { }
  }

  // More { }
} // End

Should match:

{
  if (test)
  {
    // More { }
  }

  // More { }
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

她说她爱他 2024-07-12 08:19:15

不，就这么简单。有限自动机（正则表达式底层的数据结构）除了其所处的状态之外没有内存，如果您有任意深度的嵌套，则需要一个任意大的自动机，这与 的概念相冲突有限自动机。

您可以将嵌套/配对元素匹配到固定深度，其中深度仅受您的内存限制，因为自动机变得非常大。然而，在实践中，您应该使用下推自动机，即上下文无关语法的解析器，例如 LL（自上而下）或 LR（自下而上）。您必须考虑到更糟糕的运行时行为：O(n^3) 与 O(n)，其中 n = length(input)。

有许多可用的解析器生成器，例如用于 Java 的 ANTLR。找到 Java（或 C）的现有语法也不难。
有关更多背景信息：维基百科的自动机理论

回复收藏 0 原文

冷了相思 2024-07-12 08:19:15

使用正则表达式检查嵌套模式非常简单。

'/(\((?>[^()]+|(?1))*\))/'

Using regular expressions to check for nested patterns is very easy.

'/(\((?>[^()]+|(?1))*\))/'

回复收藏 0 原文

随风而去 2024-07-12 08:19:15

如果字符串在一行上，则可能是有效的 Perl 解决方案：

my $NesteD ;
$NesteD = qr/ \{( [^{}] | (??{ $NesteD }) )* \} /x ;

if ( $Stringy =~ m/\b( \w+$NesteD )/x ) {
    print "Found: $1\n" ;
  }

HTH

编辑： 检查：

http://dev.perl.org/perl6/rfc/145.html
ruby 信息：http://www.ruby-forum.com/topic/112084
更多 perl：http://www.perlmonks.org/?node_id=660316
甚至更多 perl：https://metacpan.org/pod/Text::Balanced
perl，perl，perl：http://perl.plover.com/yak/regex/samples/slide083.html

还有一件事 Torsten Marek（他正确地指出，它不再是正则表达式）：

http://coding.derkeiler.com/Archive/Perl/comp.lang.perl.misc/2008-03/ msg01047.html

Probably working Perl solution, if the string is on one line:

my $NesteD ;
$NesteD = qr/ \{( [^{}] | (??{ $NesteD }) )* \} /x ;

if ( $Stringy =~ m/\b( \w+$NesteD )/x ) {
    print "Found: $1\n" ;
  }

HTH

EDIT: check:

http://dev.perl.org/perl6/rfc/145.html
ruby information: http://www.ruby-forum.com/topic/112084
more perl: http://www.perlmonks.org/?node_id=660316
even more perl: https://metacpan.org/pod/Text::Balanced
perl, perl, perl: http://perl.plover.com/yak/regex/samples/slide083.html

And one more thing by Torsten Marek (who had pointed out correctly, that it's not a regex anymore):

http://coding.derkeiler.com/Archive/Perl/comp.lang.perl.misc/2008-03/msg01047.html

回复收藏 0 原文

自由范儿 2024-07-12 08:19:15

常规语言的泵引理是你不能这样做的原因。

生成的自动机将具有有限数量的状态，例如 k，因此一串 k+1 个左大括号必然会在某处重复某个状态（当自动机处理字符时）。同一状态之间的字符串部分可以无限次重复，并且自动机不会知道其中的差异。

特别是，如果它接受 k+1 个左大括号，后跟 k+1 个右大括号（它应该），它也将接受泵送的左大括号数量，后跟未更改的 k+1 个右大括号（它不应该）。

回复收藏 0 原文

猫腻 2024-07-12 08:19:15

是的，如果它是 .NET RegEx 引擎。 .Net 引擎支持由外部堆栈提供的有限状态机。请参阅详细信息

回复收藏 0 原文

烟花肆意 2024-07-12 08:19:15

正确的正则表达式将无法做到这一点，因为您将离开正则语言领域而进入上下文无关语言领域。

尽管如此，许多语言提供的“正则表达式”包实际上更强大。

例如， Lua 正则表达式具有将匹配的“%b()”识别器平衡括号。在您的情况下，您将使用“%b{}”

另一个类似于 sed 的复杂工具是 gema，您可以轻松地将平衡花括号与 {#} 匹配。

因此，根据您可以使用的工具，您的“正则表达式”（广义上）可能能够匹配嵌套括号。

回复收藏 0 原文

骄兵必败 2024-07-12 08:19:15

是的

......假设有一些最大的嵌套数量，您会很乐意停下来。

让我解释。

@torsten-marek 是正确的，正则表达式无法检查这样的嵌套模式，但是可以定义一个嵌套的正则表达式模式，它允许您捕获这样的嵌套结构达到某个最大深度。我创建了一个来捕获 EBNF 风格评论 (在这里尝试一下），例如：

(* This is a comment (* this is nested inside (* another level! *) hey *) yo *)

正则表达式（用于单深度注释）如下：

m{1} = \(+\*+(?:[^*(]|(?:\*+[^)*])|(?:\(+[^*(]))*\*+\)+

这可以很容易地进行调整为了您的目的，将 \(+\*+ 和 \*+\)+ 替换为 { 和 } > 并用简单的 [^{}] 替换中间的所有内容：(

p{1} = \{(?:[^{}])*\}

这里是链接进行尝试。）

要嵌套，只需在块本身中允许此模式：

p{2} = \{(?:(?:p{1})|(?:[^{}]))*\}
  ...or...
p{2} = \{(?:(?:\{(?:[^{}])*\})|(?:[^{}]))*\}

要查找三重嵌套块，请使用：

p{3} = \{(?:(?:p{2})|(?:[^{}]))*\}
  ...or...
p{3} = \{(?:(?:\{(?:(?:\{(?:[^{}])*\})|(?:[^{}]))*\})|(?:[^{}]))*\}

出现了清晰的模式。要查找嵌套深度为 N 的注释，只需使用正则表达式：

p{N} = \{(?:(?:p{N-1})|(?:[^{}]))*\}

  where N > 1 and
  p{1} = \{(?:[^{}])*\}

可以编写脚本来递归生成这些正则表达式，但这超出了我需要的范围。（这留给读者作为练习。

YES

...assuming that there is some maximum number of nestings you'd be happy to stop at.

Let me explain.

@torsten-marek is right that a regular expression cannot check for nested patterns like this, BUT it is possible to define a nested regex pattern which will allow you to capture nested structures like this up to some maximum depth. I created one to capture EBNF-style comments (try it out here), like:

(* This is a comment (* this is nested inside (* another level! *) hey *) yo *)

The regex (for single-depth comments) is the following:

m{1} = \(+\*+(?:[^*(]|(?:\*+[^)*])|(?:\(+[^*(]))*\*+\)+

This could easily be adapted for your purposes by replacing the \(+\*+ and \*+\)+ with { and } and replacing everything in between with a simple [^{}]:

p{1} = \{(?:[^{}])*\}

(Here's the link to try that out.)

To nest, just allow this pattern within the block itself:

p{2} = \{(?:(?:p{1})|(?:[^{}]))*\}
  ...or...
p{2} = \{(?:(?:\{(?:[^{}])*\})|(?:[^{}]))*\}

To find triple-nested blocks, use:

p{3} = \{(?:(?:p{2})|(?:[^{}]))*\}
  ...or...
p{3} = \{(?:(?:\{(?:(?:\{(?:[^{}])*\})|(?:[^{}]))*\})|(?:[^{}]))*\}

A clear pattern has emerged. To find comments nested to a depth of N, simply use the regex:

p{N} = \{(?:(?:p{N-1})|(?:[^{}]))*\}

  where N > 1 and
  p{1} = \{(?:[^{}])*\}

A script could be written to recursively generate these regexes, but that's beyond the scope of what I need this for. (This is left as an exercise for the reader. ????)

回复收藏 0 原文

动次打次papapa 2024-07-12 08:19:15

正如 zsolt 提到的，一些正则表达式引擎支持递归——当然，这些引擎通常使用回溯算法，因此它不会特别有效。例如：/(?>[^{}]*){(?>[^{}]*)(?R)*(?>[^{}]*)}/sm

回复收藏 0 原文

雄赳赳气昂昂 2024-07-12 08:19:15

不，此时您正在进入上下文无关语法的领域。

回复收藏 0 原文

国粹 2024-07-12 08:19:15

这似乎有效： /(\{(?:\{.*\}|[^\{])*\})/m

回复收藏 0 原文

独行侠 2024-07-12 08:19:15

在 PHP 正则表达式引擎中使用递归匹配比括号的过程匹配要快得多。尤其是对于较长的琴弦。

http://php.net/manual/en/regexp.reference.recursive.php

例如

$patt = '!\( (?: (?: (?>[^()]+) | (?R) )* ) \)!x';

preg_match_all( $patt, $str, $m );

与

matchBrackets( $str );

function matchBrackets ( $str, $offset = 0 ) {

    $matches = array();

    list( $opener, $closer ) = array( '(', ')' );

    // Return early if there's no match
    if ( false === ( $first_offset = strpos( $str, $opener, $offset ) ) ) {
        return $matches;
    }

    // Step through the string one character at a time storing offsets
    $paren_score = -1;
    $inside_paren = false;
    $match_start = 0;
    $offsets = array();

    for ( $index = $first_offset; $index < strlen( $str ); $index++ ) {
        $char = $str[ $index ];

        if ( $opener === $char ) {
            if ( ! $inside_paren ) {
                $paren_score = 1;
                $match_start = $index;
            }
            else {
                $paren_score++;
            }
            $inside_paren = true;
        }
        elseif ( $closer === $char ) {
            $paren_score--;
        }

        if ( 0 === $paren_score ) {
            $inside_paren = false;
            $paren_score = -1;
            $offsets[] = array( $match_start, $index + 1 );
        }
    }

    while ( $offset = array_shift( $offsets ) ) {

        list( $start, $finish ) = $offset;

        $match = substr( $str, $start, $finish - $start );
        $matches[] = $match;
    }

    return $matches;
}

Using the recursive matching in the PHP regex engine is massively faster than procedural matching of brackets. especially with longer strings.

http://php.net/manual/en/regexp.reference.recursive.php

e.g.

$patt = '!\( (?: (?: (?>[^()]+) | (?R) )* ) \)!x';

preg_match_all( $patt, $str, $m );

vs.

matchBrackets( $str );

function matchBrackets ( $str, $offset = 0 ) {

    $matches = array();

    list( $opener, $closer ) = array( '(', ')' );

    // Return early if there's no match
    if ( false === ( $first_offset = strpos( $str, $opener, $offset ) ) ) {
        return $matches;
    }

    // Step through the string one character at a time storing offsets
    $paren_score = -1;
    $inside_paren = false;
    $match_start = 0;
    $offsets = array();

    for ( $index = $first_offset; $index < strlen( $str ); $index++ ) {
        $char = $str[ $index ];

        if ( $opener === $char ) {
            if ( ! $inside_paren ) {
                $paren_score = 1;
                $match_start = $index;
            }
            else {
                $paren_score++;
            }
            $inside_paren = true;
        }
        elseif ( $closer === $char ) {
            $paren_score--;
        }

        if ( 0 === $paren_score ) {
            $inside_paren = false;
            $paren_score = -1;
            $offsets[] = array( $match_start, $index + 1 );
        }
    }

    while ( $offset = array_shift( $offsets ) ) {

        list( $start, $finish ) = $offset;

        $match = substr( $str, $start, $finish - $start );
        $matches[] = $match;
    }

    return $matches;
}

回复收藏 0 原文

~没有更多了~

关于作者

人│生佛魔见

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

可以使用正则表达式来匹配嵌套模式吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（11）

是的

YES

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

可以使用正则表达式来匹配嵌套模式吗？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（11）

是的

YES

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。