使用REGEX匹配功能的主体

发布于 2025-01-25 07:05:53 字数 1270 浏览 4 评论 0原文

Given a dummy function as such:

public function handle()
{
  if (isset($input['data']) {
    switch($data) {
      ...
    }
  } else {
    switch($data) {
      ...
    }
  }
}

My intention is to get the contents of that function, the problem is matching nested patterns of curly braces {...}.

我遇到了递归模式将函数的身体匹配的正则态度。

I've tried the following (no recursion):

$pattern = "/function\shandle\([a-zA-Z0-9_\$\s,]+\)?". // match "function handle(...)"
            '[\n\s]?[\t\s]*'. // regardless of the indentation preceding the {
            '{([^{}]*)}/'; // find everything within braces.

preg_match($pattern, $contents, $match);

That pattern doesn't match at all.我相信这是最后的位,错误'{([^{}]*)}/',因为当身体内没有其他牙套时,该模式有效。

By replacing it with:

'{([^}]*)}/';

It matched till the closing } of the switch inside the if statement and stopped there (including } of the switch but excluding如果)的)。

以及这种模式,同样的结果:

'{(\K[^}]*(?=)})/m';

Given a dummy function as such:

public function handle()
{
  if (isset($input['data']) {
    switch($data) {
      ...
    }
  } else {
    switch($data) {
      ...
    }
  }
}

My intention is to get the contents of that function, the problem is matching nested patterns of curly braces {...}.

I've come across recursive patterns but couldn't get my head around a regex that would match the function's body.

I've tried the following (no recursion):

$pattern = "/function\shandle\([a-zA-Z0-9_\$\s,]+\)?". // match "function handle(...)"
            '[\n\s]?[\t\s]*'. // regardless of the indentation preceding the {
            '{([^{}]*)}/'; // find everything within braces.

preg_match($pattern, $contents, $match);

That pattern doesn't match at all. I am sure it is the last bit that is wrong '{([^{}]*)}/' since that pattern works when there are no other braces within the body.

By replacing it with:

'{([^}]*)}/';

It matched till the closing } of the switch inside the if statement and stopped there (including } of the switch but excluding that of the if).

As well as this pattern, same result:

'{(\K[^}]*(?=)})/m';

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

温柔嚣张 2025-02-01 07:05:53

更新#2

根据其他注释

^\s*[\w\s]+\(.*\)\s*\K({((?>"(?:[^"\\]*+|\\.)*"|'(?:[^'\\]*+|\\.)*'|//.*$|/\*[\s\S]*?\*/|#.*$|<<<\s*["']?(\w+)["']?[^;]+\3;$|[^{}<'"/#]++|[^{}]++|(?1))*)})

注意:一个简短的正则{((?&gt; [^{}] ++ |(?r))*)代码>就足够了,如果您知道输入不包含{}在PHP语法中。

,则是一个长的言论。 /em>案例起作用吗?

  1. }] 在引号标记之间的字符串
  2. 您的[ { >在注释块中。
  3. 代码 > [{}] 在Heredoc或NowDoc &lt;&lt;&lt;&lt;&lt; str&lt;&lt;&lt;&lt;&lt; [''“”] str ['] str [']

否则我们的嵌套括号和嵌套的深度不重要

否则,

您有一个居住在您的代码中的火星,

 ^ \s* [\w\s]+ \( .* \) \s* \K               # how it matches a function definition
 (                             # (1 start)
      {                                      # opening brace
      (                             # (2 start)
           (?>                               # atomic grouping (for its non-capturing purpose only)
                "(?: [^"\\]*+ | \\ . )*"     # double quoted strings
             |  '(?: [^'\\]*+ | \\ . )*'     # single quoted strings
             |  // .* $                      # a comment block starting with //
             |  /\* [\s\S]*? \*/             # a multi line comment block /*...*/
             |  \# .* $                      # a single line comment block starting with #...
             |  <<< \s* ["']?                # heredocs and nowdocs
                ( \w+ )                      # (3) ^
                ["']? [^;]+ \3 ; $           # ^
             |  [^{}<'"/#]++                 # force engine to backtack if it encounters special characters [<'"/#] (possessive)
             |  [^{}]++                      # default matching bahaviour (possessive)
             |  (?1)                         # recurse 1st capturing group
           )*                                # zero to many times of atomic group
      )                             # (2 end)
      }                                      # closing brace
 )                             # (1 end)

除非 @sln的 regexformatter 软件。

在live demo中提供了什么?

我 https://raw.githubusercontent.com/illuminate/database/master/master/eloquent/model.php“ rel =“ noreferrer”> model.php.php file(〜3500行)随机给出作为输入。 :
实时演示

Update #2

According to others comments

^\s*[\w\s]+\(.*\)\s*\K({((?>"(?:[^"\\]*+|\\.)*"|'(?:[^'\\]*+|\\.)*'|//.*$|/\*[\s\S]*?\*/|#.*$|<<<\s*["']?(\w+)["']?[^;]+\3;$|[^{}<'"/#]++|[^{}]++|(?1))*)})

Note: A short RegEx i.e. {((?>[^{}]++|(?R))*)} is enough if you know your input does not contain { or } out of PHP syntax.

So a long RegEx, in what evil cases does it work?

  1. You have [{}] in a string between quotation marks ["']
  2. You have those quotation marks escaped inside one another
  3. You have [{}] in a comment block. //... or /*...*/ or #...
  4. You have [{}] in a heredoc or nowdoc <<<STR or <<<['"]STR['"]

Otherwise it is meant to have a pair of opening/closing braces and depth of nested braces is not important.

Do we have a case that it fails?

No unless you have a martian that lives inside your codes.

 ^ \s* [\w\s]+ \( .* \) \s* \K               # how it matches a function definition
 (                             # (1 start)
      {                                      # opening brace
      (                             # (2 start)
           (?>                               # atomic grouping (for its non-capturing purpose only)
                "(?: [^"\\]*+ | \\ . )*"     # double quoted strings
             |  '(?: [^'\\]*+ | \\ . )*'     # single quoted strings
             |  // .* $                      # a comment block starting with //
             |  /\* [\s\S]*? \*/             # a multi line comment block /*...*/
             |  \# .* $                      # a single line comment block starting with #...
             |  <<< \s* ["']?                # heredocs and nowdocs
                ( \w+ )                      # (3) ^
                ["']? [^;]+ \3 ; $           # ^
             |  [^{}<'"/#]++                 # force engine to backtack if it encounters special characters [<'"/#] (possessive)
             |  [^{}]++                      # default matching bahaviour (possessive)
             |  (?1)                         # recurse 1st capturing group
           )*                                # zero to many times of atomic group
      )                             # (2 end)
      }                                      # closing brace
 )                             # (1 end)

Formatting is done by @sln's RegexFormatter software.

What I provided in live demo?

Laravel's Eloquent Model.php file (~3500 lines) randomly is given as input. Check it out:
Live demo

故事↓在人 2025-02-01 07:05:53

这可以使输出标头文件(.h)输出内联功能块(.c)

查找正则表达式:

(void\s[^{};]*)\n^\{($[^}$]*)\}$

替换为:

$1;

输入:

void bar(int var)
{ 
    foo(var);
    foo2();
}

将输出:

void bar(int var);

以第二匹配的模式获取功能块的主体:将

$2

输出:将输出:

    foo(var);
    foo2();

This works to output header file (.h) out of inline function blocks (.c)

Find Regular expression:

(void\s[^{};]*)\n^\{($[^}$]*)\}$

Replace with:

$1;

For input:

void bar(int var)
{ 
    foo(var);
    foo2();
}

will output:

void bar(int var);

Get the body of the function block with second matched pattern :

$2

will output:

    foo(var);
    foo2();
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文