用不在潜在嵌套括号内的逗号分割字符串

发布于 2024-07-26 08:33:51 字数 455 浏览 3 评论 0原文

两天前,我开始研究代码解析器,但我陷入了困境。

如何用不在括号内的逗号分隔字符串? 让我告诉你我的意思。

我有这个字符串要解析:

one, two, three, (four, (five, six), (ten)), seven

我想得到这个结果:

array(
 "one"; 
 "two"; 
 "three"; 
 "(four, (five, six), (ten))"; 
 "seven"
)

但我得到的是:

array(
  "one"; 
  "two"; 
  "three"; 
  "(four"; 
  "(five"; 
  "six)"; 
  "(ten))";
  "seven"
)

如何在 PHP RegEx 中做到这一点。

Two days ago I started working on a code parser and I'm stuck.

How can I split a string by commas that are not inside brackets? Let me show you what I mean.

I have this string to parse:

one, two, three, (four, (five, six), (ten)), seven

I would like to get this result:

array(
 "one"; 
 "two"; 
 "three"; 
 "(four, (five, six), (ten))"; 
 "seven"
)

but instead I get:

array(
  "one"; 
  "two"; 
  "three"; 
  "(four"; 
  "(five"; 
  "six)"; 
  "(ten))";
  "seven"
)

How can I do this in PHP RegEx.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

糖粟与秋泊 2024-08-02 08:33:52

这可以通过递归模式 (?R) 来完成:

$regex = "/[(](?:[^()]*(?R))*[^()]*[)]|(?=\\S)[^,()]+(?<=\\S)/";

// Example
$s = "one, two, three, (four, (five, six), (ten)), seven";
preg_match_all($regex, $s, $matches);

$matches[0] 将是:

[
    "one",
    "two",
    "three",
    "(four, (five, six), (ten))",
    "seven"
]

说明:

该模式的第二部分定义了基本情况:

(?=\\S)[^,()]+(?<=\\S)
  • [^,()]+:排除 () 的字符序列。
  • (?=\\S):断言上述序列以非空白字符开头
  • (?<=\\S):断言上述序列序列以非空白字符结尾 正

则表达式的第一部分处理括号:

[(](?:[^()]*(?R))*[^()]*[)]
  • [(]:匹配必须以左大括号开头
  • [)]:匹配必须以右大括号
  • [^()]* 结尾:可能为括号的空字符序列。 请注意,此处捕获了逗号。
  • (?R):递归应用完整的正则表达式模式
  • (?:[^()]*(?R))*:字符序列的零次或多次重复不是括号,后跟嵌套表达式。

因此它匹配括号之间的某些内容,其中该内容可以有任意数量的嵌套匹配,并由非括号序列交替。

This can be done with a recursive pattern (?R):

$regex = "/[(](?:[^()]*(?R))*[^()]*[)]|(?=\\S)[^,()]+(?<=\\S)/";

// Example
$s = "one, two, three, (four, (five, six), (ten)), seven";
preg_match_all($regex, $s, $matches);

$matches[0] will be:

[
    "one",
    "two",
    "three",
    "(four, (five, six), (ten))",
    "seven"
]

Explanation:

The second part of the pattern defines the basic case:

(?=\\S)[^,()]+(?<=\\S)
  • [^,()]+: a sequence of characters that exclude ,, (, and ).
  • (?=\\S): asserts that the above mentioned sequence starts with a non-whitespace character
  • (?<=\\S): asserts that the above mentioned sequence ends with a non-whitespace character

The first part of the regex deals with the parentheses:

[(](?:[^()]*(?R))*[^()]*[)]
  • [(]: the match must start with an opening brace
  • [)]: the match must end with a closing brace
  • [^()]*: a possibly empty sequence of characters that are parentheses. Note that commas are captured here.
  • (?R): applying the complete regex pattern recursively
  • (?:[^()]*(?R))*: zero or more repetitions of a sequence of characters that are not parentheses, followed by a nested expression.

So it matches something between parentheses, where that something can have any number of nested matches, alternated by non-parentheses sequences.

红ご颜醉 2024-08-02 08:33:51

您可以更轻松地做到这一点:

preg_match_all('/[^(,\s]+|\([^)]+\)/', $str, $matches)

但如果您使用真正的解析器会更好。 也许是这样的:

$str = 'one, two, three, (four, (five, six), (ten)), seven';
$buffer = '';
$stack = array();
$depth = 0;
$len = strlen($str);
for ($i=0; $i<$len; $i++) {
    $char = $str[$i];
    switch ($char) {
    case '(':
        $depth++;
        break;
    case ',':
        if (!$depth) {
            if ($buffer !== '') {
                $stack[] = $buffer;
                $buffer = '';
            }
            continue 2;
        }
        break;
    case ' ':
        if (!$depth) {
            continue 2;
        }
        break;
    case ')':
        if ($depth) {
            $depth--;
        } else {
            $stack[] = $buffer.$char;
            $buffer = '';
            continue 2;
        }
        break;
    }
    $buffer .= $char;
}
if ($buffer !== '') {
    $stack[] = $buffer;
}
var_dump($stack);

You can do that easier:

preg_match_all('/[^(,\s]+|\([^)]+\)/', $str, $matches)

But it would be better if you use a real parser. Maybe something like this:

$str = 'one, two, three, (four, (five, six), (ten)), seven';
$buffer = '';
$stack = array();
$depth = 0;
$len = strlen($str);
for ($i=0; $i<$len; $i++) {
    $char = $str[$i];
    switch ($char) {
    case '(':
        $depth++;
        break;
    case ',':
        if (!$depth) {
            if ($buffer !== '') {
                $stack[] = $buffer;
                $buffer = '';
            }
            continue 2;
        }
        break;
    case ' ':
        if (!$depth) {
            continue 2;
        }
        break;
    case ')':
        if ($depth) {
            $depth--;
        } else {
            $stack[] = $buffer.$char;
            $buffer = '';
            continue 2;
        }
        break;
    }
    $buffer .= $char;
}
if ($buffer !== '') {
    $stack[] = $buffer;
}
var_dump($stack);
春风十里 2024-08-02 08:33:51

嗯...好的已经标记为已回答,但既然你要求一个简单的解决方案,我仍然会尝试:

$test = "one, two, three, , , ,(four, five, six), seven, (eight, nine)";
$split = "/([(].*?[)])|(\w)+/";
preg_match_all($split, $test, $out);
print_r($out[0]);              

输出

Array
(
    [0] => one
    [1] => two
    [2] => three
    [3] => (four, five, six)
    [4] => seven
    [5] => (eight, nine)
)

Hm... OK already marked as answered, but since you asked for an easy solution I will try nevertheless:

$test = "one, two, three, , , ,(four, five, six), seven, (eight, nine)";
$split = "/([(].*?[)])|(\w)+/";
preg_match_all($split, $test, $out);
print_r($out[0]);              

Output

Array
(
    [0] => one
    [1] => two
    [2] => three
    [3] => (four, five, six)
    [4] => seven
    [5] => (eight, nine)
)
不羁少年 2024-08-02 08:33:51

你不能直接。 你至少需要可变宽度的lookbehind,最后我知道PHP的PCRE只有固定宽度的lookbehind。

我的第一个建议是首先从字符串中提取带括号的表达式。 不过,我对您的实际问题一无所知,所以我不知道这是否可行。

You can't, directly. You'd need, at minimum, variable-width lookbehind, and last I knew PHP's PCRE only has fixed-width lookbehind.

My first recommendation would be to first extract parenthesized expressions from the string. I don't know anything about your actual problem, though, so I don't know if that will be feasible.

把人绕傻吧 2024-08-02 08:33:51

我想不出一种使用单个正则表达式来完成此操作的方法,但是将一些有效的东西组合在一起非常容易:

function process($data)
{
        $entries = array();
        $filteredData = $data;
        if (preg_match_all("/\(([^)]*)\)/", $data, $matches)) {
                $entries = $matches[0];
                $filteredData = preg_replace("/\(([^)]*)\)/", "-placeholder-", $data);
        }

        $arr = array_map("trim", explode(",", $filteredData));

        if (!$entries) {
                return $arr;
        }

        $j = 0;
        foreach ($arr as $i => $entry) {
                if ($entry != "-placeholder-") {
                        continue;
                }

                $arr[$i] = $entries[$j];
                $j++;
        }

        return $arr;
}

如果您像这样调用它:

$data = "one, two, three, (four, five, six), seven, (eight, nine)";
print_r(process($data));

它会输出:

Array
(
    [0] => one
    [1] => two
    [2] => three
    [3] => (four, five, six)
    [4] => seven
    [5] => (eight, nine)
)

I can't think of a way to do it using a single regex, but it's quite easy to hack together something that works:

function process($data)
{
        $entries = array();
        $filteredData = $data;
        if (preg_match_all("/\(([^)]*)\)/", $data, $matches)) {
                $entries = $matches[0];
                $filteredData = preg_replace("/\(([^)]*)\)/", "-placeholder-", $data);
        }

        $arr = array_map("trim", explode(",", $filteredData));

        if (!$entries) {
                return $arr;
        }

        $j = 0;
        foreach ($arr as $i => $entry) {
                if ($entry != "-placeholder-") {
                        continue;
                }

                $arr[$i] = $entries[$j];
                $j++;
        }

        return $arr;
}

If you invoke it like this:

$data = "one, two, three, (four, five, six), seven, (eight, nine)";
print_r(process($data));

It outputs:

Array
(
    [0] => one
    [1] => two
    [2] => three
    [3] => (four, five, six)
    [4] => seven
    [5] => (eight, nine)
)
携君以终年 2024-08-02 08:33:51

也许有点晚了,但我已经做了一个没有正则表达式的解决方案,它也支持嵌套在括号内。 任何人都可以告诉我你们的想法:

$str = "Some text, Some other text with ((95,3%) MSC)";
$arr = explode(",",$str);

$parts = [];
$currentPart = "";
$bracketsOpened = 0;
foreach ($arr as $part){
    $currentPart .= ($bracketsOpened > 0 ? ',' : '').$part;
    if (stristr($part,"(")){
        $bracketsOpened ++;
    }
    if (stristr($part,")")){
        $bracketsOpened --;                 
    }
    if (!$bracketsOpened){
        $parts[] = $currentPart;
        $currentPart = '';
    }
}

给我输出:

Array
(
    [0] => Some text
    [1] =>  Some other text with ((95,3%) MSC)
)

Maybe a bit late but I've made a solution without regex which also supports nesting inside brackets. Anyone let me know what you guys think:

$str = "Some text, Some other text with ((95,3%) MSC)";
$arr = explode(",",$str);

$parts = [];
$currentPart = "";
$bracketsOpened = 0;
foreach ($arr as $part){
    $currentPart .= ($bracketsOpened > 0 ? ',' : '').$part;
    if (stristr($part,"(")){
        $bracketsOpened ++;
    }
    if (stristr($part,")")){
        $bracketsOpened --;                 
    }
    if (!$bracketsOpened){
        $parts[] = $currentPart;
        $currentPart = '';
    }
}

Gives me the output:

Array
(
    [0] => Some text
    [1] =>  Some other text with ((95,3%) MSC)
)
苍景流年 2024-08-02 08:33:51

笨拙,但它确实有效......

<?php

function split_by_commas($string) {
  preg_match_all("/\(.+?\)/", $string, $result); 
  $problem_children = $result[0];
  $i = 0;
  $temp = array();
  foreach ($problem_children as $submatch) { 
    $marker = '__'.$i++.'__';
    $temp[$marker] = $submatch;
    $string   = str_replace($submatch, $marker, $string);  
  }
  $result = explode(",", $string);
  foreach ($result as $key => $item) {
    $item = trim($item);
    $result[$key] = isset($temp[$item])?$temp[$item]:$item;
  }
  return $result;
}


$test = "one, two, three, (four, five, six), seven, (eight, nine), ten";

print_r(split_by_commas($test));

?>

Clumsy, but it does the job...

<?php

function split_by_commas($string) {
  preg_match_all("/\(.+?\)/", $string, $result); 
  $problem_children = $result[0];
  $i = 0;
  $temp = array();
  foreach ($problem_children as $submatch) { 
    $marker = '__'.$i++.'__';
    $temp[$marker] = $submatch;
    $string   = str_replace($submatch, $marker, $string);  
  }
  $result = explode(",", $string);
  foreach ($result as $key => $item) {
    $item = trim($item);
    $result[$key] = isset($temp[$item])?$temp[$item]:$item;
  }
  return $result;
}


$test = "one, two, three, (four, five, six), seven, (eight, nine), ten";

print_r(split_by_commas($test));

?>
七颜 2024-08-02 08:33:51

我觉得值得注意的是,您应该尽可能避免使用正则表达式。 为此,您应该知道,对于 PHP 5.3+,您可以使用 str_getcsv()。 但是,如果您正在处理文件(或文件流),例如 CSV 文件,则函数 fgetcsv() 可能就是您所需要的,它从 PHP4 开始就可用。

最后,我很惊讶没有人使用 preg_split(),或者它没有按需要工作?

I feel that its worth noting, that you should always avoid regular expressions when you possibly can. To that end, you should know that for PHP 5.3+ you could use str_getcsv(). However, if you're working with files (or file streams), such as CSV files, then the function fgetcsv() might be what you need, and its been available since PHP4.

Lastly, I'm surprised nobody used preg_split(), or did it not work as needed?

幸福不弃 2024-08-02 08:33:51

我担心解析嵌套括号会非常困难,例如
一、二、(三、(四、五))
仅适用于正则表达式。

I am afraid that it could be very difficult to parse nested brackets like
one, two, (three, (four, five))
only with RegExp.

紫罗兰の梦幻 2024-08-02 08:33:51

这个更正确,但仍然不适用于嵌套括号 /[^(,]*(?:([^)]+))?[^),]*/

– DarkSide 2013 年 3 月 24 日 23:09

您的方法无法解析“一、二、三、((五)、(四(六)))、七、八、九”。 我认为正确的正则表达式是递归的:/(([^()]+|(?R))*)/。

– 克里斯蒂安·托马 2009 年 7 月 6 日 7:26

是的,它更容易,但在嵌套括号的情况下不起作用,如下所示:一,二,三,(四,(五,六),(十)),七

– 克里斯蒂安·托马 2009 年 7 月 6 日 7:41

非常感谢,非常感谢您的帮助。 但现在我意识到我也会遇到嵌套括号,并且您的解决方案不适用。

– 克里斯蒂安·托马 2009 年 7 月 6 日 7:43

在我看来,我们需要一个尊重平衡括号分组的字符串分割算法。 我将使用递归正则表达式模式来解决这个问题! 该行为将尊重最低的平衡括号,并让任何更高级别的不平衡括号被视为非分组字符。 请对任何未正确分割的输入字符串发表评论,以便我可以尝试进行改进(测试驱动开发)。

代码:(Demo)

$tests = [
    'one, two, three, (four, five, six), seven, (eight, nine)',
    '()',
    'one and a ),',
    '(one, two, three)',
    'one, (, two',
    'one, two, ), three',
    'one, (unbalanced, (nested, whoops ) two',
    'one, two, three and a half, ((five), (four(six))), seven, eight, nine',
    'one, (two, (three and a half, (four, (five, (six, seven), eight)))), nine, (ten, twenty twen twen)',
    'ten, four, (,), good buddy',
];

foreach ($tests as $test) {
    var_export(
        preg_split(
            '/(?>(\((?:(?>[^()]+)|(?1))*\))|[^,]+)\K,?\s*/',
            $test,
            0,
            PREG_SPLIT_NO_EMPTY
        )
    );
    echo "\n";
}

输出:

array (
  0 => 'one',
  1 => 'two',
  2 => 'three',
  3 => '(four, five, six)',
  4 => 'seven',
  5 => '(eight, nine)',
)
array (
  0 => '()',
)
array (
  0 => 'one and a )',
)
array (
  0 => '(one, two, three)',
)
array (
  0 => 'one',
  1 => '(',
  2 => 'two',
)
array (
  0 => 'one',
  1 => 'two',
  2 => ')',
  3 => 'three',
)
array (
  0 => 'one',
  1 => '(unbalanced',
  2 => '(nested, whoops )',
  3 => 'two',
)
array (
  0 => 'one',
  1 => 'two',
  2 => 'three and a half',
  3 => '((five), (four(six)))',
  4 => 'seven',
  5 => 'eight',
  6 => 'nine',
)
array (
  0 => 'one',
  1 => '(two, (three and a half, (four, (five, (six, seven), eight))))',
  2 => 'nine',
  3 => '(ten, twenty twen twen)',
)
array (
  0 => 'ten',
  1 => 'four',
  2 => '(,)',
  3 => 'good buddy',
)

这是一个相关的答案,它递归地遍历括号组并反转逗号分隔值的顺序在每个级别上:反转括号内分组文本的顺序并反转括号组的顺序

This one is more correct, but still not working for nested parenthesis /[^(,]*(?:([^)]+))?[^),]*/

– DarkSide Mar 24, 2013 at 23:09

You're method can not parse "one, two, three, ((five), (four(six))), seven, eight, nine". I think the correct RegEx would be a recursive one: /(([^()]+|(?R))*)/.

– Cristian Toma Jul 6, 2009 at 7:26

Yes, it's easier, but doesn't work in case of nested brackets, like so: one, two, three, (four, (five, six), (ten)), seven

– Cristian Toma Jul 6, 2009 at 7:41

Thank you very much, your help is much appreciated. But now I realize that I will also encounter nested brackets and your solution doesn't apply.

– Cristian Toma Jul 6, 2009 at 7:43

Sounds to me that we need to have a string splitting algorithm that respects balanced parenthetical grouping. I'll give that a crack using a recursive regex pattern! The behavior will be to respect the lowest balanced parentheticals and let any higher level un-balanced parentheticals be treated as non-grouping characters. Please leave a comment with any input strings that are not correctly split so that I can try to make improvements (test driven development).

Code: (Demo)

$tests = [
    'one, two, three, (four, five, six), seven, (eight, nine)',
    '()',
    'one and a ),',
    '(one, two, three)',
    'one, (, two',
    'one, two, ), three',
    'one, (unbalanced, (nested, whoops ) two',
    'one, two, three and a half, ((five), (four(six))), seven, eight, nine',
    'one, (two, (three and a half, (four, (five, (six, seven), eight)))), nine, (ten, twenty twen twen)',
    'ten, four, (,), good buddy',
];

foreach ($tests as $test) {
    var_export(
        preg_split(
            '/(?>(\((?:(?>[^()]+)|(?1))*\))|[^,]+)\K,?\s*/',
            $test,
            0,
            PREG_SPLIT_NO_EMPTY
        )
    );
    echo "\n";
}

Output:

array (
  0 => 'one',
  1 => 'two',
  2 => 'three',
  3 => '(four, five, six)',
  4 => 'seven',
  5 => '(eight, nine)',
)
array (
  0 => '()',
)
array (
  0 => 'one and a )',
)
array (
  0 => '(one, two, three)',
)
array (
  0 => 'one',
  1 => '(',
  2 => 'two',
)
array (
  0 => 'one',
  1 => 'two',
  2 => ')',
  3 => 'three',
)
array (
  0 => 'one',
  1 => '(unbalanced',
  2 => '(nested, whoops )',
  3 => 'two',
)
array (
  0 => 'one',
  1 => 'two',
  2 => 'three and a half',
  3 => '((five), (four(six)))',
  4 => 'seven',
  5 => 'eight',
  6 => 'nine',
)
array (
  0 => 'one',
  1 => '(two, (three and a half, (four, (five, (six, seven), eight))))',
  2 => 'nine',
  3 => '(ten, twenty twen twen)',
)
array (
  0 => 'ten',
  1 => 'four',
  2 => '(,)',
  3 => 'good buddy',
)

Here's a related answer which recursively traverses parenthetical groups and reverses the order of comma separated values on each level: Reverse the order of parenthetically grouped text and reverse the order of parenthetical groups

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文