正则表达式匹配除 2 个连续花括号之外的任何内容

发布于 2024-10-09 03:18:58 字数 237 浏览 0 评论 0 原文

匹配除 2 个连续大括号 ({) 之外的任何内容的正则表达式是什么?
示例字符串:
{{some text}} string 我想要{{another set {{and inside}} }}
我只想获取我想要的字符串

我曾想过使用堆栈来做这些事情,但我想知道这是否可以使用正则表达式来完成。
我正在使用 PHP 的 PCRE

提前致谢

What would be the regular expression to match anything but 2 consecutive curly braces ({) ?
An example string:
{{some text}} string I want {{another set {{and inner}} }}
I want to get only string i want.

Using stack to do the stuff had crossed my mind, but I wanted to know if this can be done using regex.

I'm using PHP's PCRE

Thanks in advance

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

嗫嚅 2024-10-16 03:18:58

使用前瞻断言 (?!{{|}}) 来验证您的外部大括号内没有嵌套的大括号组。

{{((?!{{|}}).)*}}

测试程序

<?php
$string = '{{lot {{of}} characters}}';

for (;;)
{
    var_dump($string);
    $replacement = preg_replace('/{{((?!{{|}}).)*}}/', '', $string);

    if ($string == $replacement)
        break;

    $string = $replacement;
}

输出

string(25) "{{lot {{of}} characters}}"
string(19) "{{lot  characters}}"
string(0) ""

它似乎也合理地处理了各种边缘情况:

# Unbalanced braces.
string(23) "{{lot {{of}} characters"
string(17) "{{lot  characters"

string(23) "lot {{of}} characters}}"
string(17) "lot  characters}}"

# Multiple sets of braces.
string(25) "{{lot }}of{{ characters}}"
string(2) "of"

# Lone curlies.
string(41) "{{lot {{of {single curly} }} characters}}"
string(19) "{{lot  characters}}"
string(0) ""

Use a lookahead assertion (?!{{|}}) to verify that you don't have a nested set of braces inside of your outer set.

{{((?!{{|}}).)*}}

Test program

<?php
$string = '{{lot {{of}} characters}}';

for (;;)
{
    var_dump($string);
    $replacement = preg_replace('/{{((?!{{|}}).)*}}/', '', $string);

    if ($string == $replacement)
        break;

    $string = $replacement;
}

Output

string(25) "{{lot {{of}} characters}}"
string(19) "{{lot  characters}}"
string(0) ""

It appears to handle various edge cases reasonably, as well:

# Unbalanced braces.
string(23) "{{lot {{of}} characters"
string(17) "{{lot  characters"

string(23) "lot {{of}} characters}}"
string(17) "lot  characters}}"

# Multiple sets of braces.
string(25) "{{lot }}of{{ characters}}"
string(2) "of"

# Lone curlies.
string(41) "{{lot {{of {single curly} }} characters}}"
string(19) "{{lot  characters}}"
string(0) ""
夜灵血窟げ 2024-10-16 03:18:58

如果您需要对内容执行更复杂的操作,例如处理内容或变量,那么您可以使用递归正则表达式,利用 (?R) 运算符。

$data = "{{abcde{{fg{{hi}}jk}}lm}}";
$regexp = "#\{\{((?:[^(\{\{)(\}\})]+|(?R))+)\}\}#";
$count = 0;

function revMatch($matches) {
  global $regexp, $count;

  if (is_array($matches)) {
    // Match detected, process for nested components
    $subData = preg_replace_callback($regexp, 'revMatch', $matches[1]);
  } else {
    // No match, leave text alone
    $subData = $matches;
  }

  // This numbers each match, to demonstrate call order
  return "(" . $count++ . ":<" . $subData . ">)";
}

echo preg_replace_callback($regexp, 'revMatch', $data);

这会将: {{abcde{{fg{{hi}}jk}}lm}} 转换为 (2:; )jk>)lm>)


关于正则表达式的一些解释:#\{\{((?:[^(\{\{)(\}\})]+|(? R))+)\}\}#

前后双大括号匹配任意目标组件,大括号的内容为两个已定义选项中的一个或多个:

  1. a string with没有双大括号 [^(\{\{)(\}\})]+

  2. 整个正则表达式重复。 (?:) 括号是非捕获组。

注意。 #s 是模式分隔符,我认为额外的斜杠会进一步降低可读性。

If you need to do something more complicated with the contents, such as processing the contents or the variables, then you can use a recursive regexp, making use of the (?R) operator.

$data = "{{abcde{{fg{{hi}}jk}}lm}}";
$regexp = "#\{\{((?:[^(\{\{)(\}\})]+|(?R))+)\}\}#";
$count = 0;

function revMatch($matches) {
  global $regexp, $count;

  if (is_array($matches)) {
    // Match detected, process for nested components
    $subData = preg_replace_callback($regexp, 'revMatch', $matches[1]);
  } else {
    // No match, leave text alone
    $subData = $matches;
  }

  // This numbers each match, to demonstrate call order
  return "(" . $count++ . ":<" . $subData . ">)";
}

echo preg_replace_callback($regexp, 'revMatch', $data);

This converts: {{abcde{{fg{{hi}}jk}}lm}} to (2:<abcde(1:<fg(0:<hi>)jk>)lm>)


A bit of explanation on the regexp: #\{\{((?:[^(\{\{)(\}\})]+|(?R))+)\}\}#

The double braces at the front and back match any target component, the contents of the braces are to be one or more of the two defined options:

  1. a string with no double braces [^(\{\{)(\}\})]+

  2. the whole regexp repeated. The (?:) bracket is a non-capturing group.

NB. The #s are the pattern delimiters, I thought extra slashes would decrease readability further.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文