用不在潜在嵌套括号内的逗号分割字符串
两天前,我开始研究代码解析器,但我陷入了困境。
如何用不在括号内的逗号分隔字符串? 让我告诉你我的意思。
我有这个字符串要解析:
one, two, three, (four, (five, six), (ten)), seven
我想得到这个结果:
array(
"one";
"two";
"three";
"(four, (five, six), (ten))";
"seven"
)
但我得到的是:
array(
"one";
"two";
"three";
"(four";
"(five";
"six)";
"(ten))";
"seven"
)
如何在 PHP RegEx 中做到这一点。
Two days ago I started working on a code parser and I'm stuck.
How can I split a string by commas that are not inside brackets? Let me show you what I mean.
I have this string to parse:
one, two, three, (four, (five, six), (ten)), seven
I would like to get this result:
array(
"one";
"two";
"three";
"(four, (five, six), (ten))";
"seven"
)
but instead I get:
array(
"one";
"two";
"three";
"(four";
"(five";
"six)";
"(ten))";
"seven"
)
How can I do this in PHP RegEx.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
这可以通过递归模式
(?R)
来完成:$matches[0]
将是:说明:
该模式的第二部分定义了基本情况:
[^,()]+
:排除、
、(
和)
的字符序列。(?=\\S)
:断言上述序列以非空白字符开头(?<=\\S)
:断言上述序列序列以非空白字符结尾 正则表达式的第一部分处理括号:
[(]
:匹配必须以左大括号开头[)]
:匹配必须以右大括号[^()]*
结尾:可能为括号的空字符序列。 请注意,此处捕获了逗号。(?R)
:递归应用完整的正则表达式模式(?:[^()]*(?R))*
:字符序列的零次或多次重复不是括号,后跟嵌套表达式。因此它匹配括号之间的某些内容,其中该内容可以有任意数量的嵌套匹配,并由非括号序列交替。
This can be done with a recursive pattern
(?R)
:$matches[0]
will be:Explanation:
The second part of the pattern defines the basic case:
[^,()]+
: a sequence of characters that exclude,
,(
, and)
.(?=\\S)
: asserts that the above mentioned sequence starts with a non-whitespace character(?<=\\S)
: asserts that the above mentioned sequence ends with a non-whitespace characterThe first part of the regex deals with the parentheses:
[(]
: the match must start with an opening brace[)]
: the match must end with a closing brace[^()]*
: a possibly empty sequence of characters that are parentheses. Note that commas are captured here.(?R)
: applying the complete regex pattern recursively(?:[^()]*(?R))*
: zero or more repetitions of a sequence of characters that are not parentheses, followed by a nested expression.So it matches something between parentheses, where that something can have any number of nested matches, alternated by non-parentheses sequences.
您可以更轻松地做到这一点:
但如果您使用真正的解析器会更好。 也许是这样的:
You can do that easier:
But it would be better if you use a real parser. Maybe something like this:
嗯...好的已经标记为已回答,但既然你要求一个简单的解决方案,我仍然会尝试:
输出
Hm... OK already marked as answered, but since you asked for an easy solution I will try nevertheless:
Output
你不能直接。 你至少需要可变宽度的lookbehind,最后我知道PHP的PCRE只有固定宽度的lookbehind。
我的第一个建议是首先从字符串中提取带括号的表达式。 不过,我对您的实际问题一无所知,所以我不知道这是否可行。
You can't, directly. You'd need, at minimum, variable-width lookbehind, and last I knew PHP's PCRE only has fixed-width lookbehind.
My first recommendation would be to first extract parenthesized expressions from the string. I don't know anything about your actual problem, though, so I don't know if that will be feasible.
我想不出一种使用单个正则表达式来完成此操作的方法,但是将一些有效的东西组合在一起非常容易:
如果您像这样调用它:
它会输出:
I can't think of a way to do it using a single regex, but it's quite easy to hack together something that works:
If you invoke it like this:
It outputs:
也许有点晚了,但我已经做了一个没有正则表达式的解决方案,它也支持嵌套在括号内。 任何人都可以告诉我你们的想法:
给我输出:
Maybe a bit late but I've made a solution without regex which also supports nesting inside brackets. Anyone let me know what you guys think:
Gives me the output:
笨拙,但它确实有效......
Clumsy, but it does the job...
我觉得值得注意的是,您应该尽可能避免使用正则表达式。 为此,您应该知道,对于 PHP 5.3+,您可以使用 str_getcsv()。 但是,如果您正在处理文件(或文件流),例如 CSV 文件,则函数 fgetcsv() 可能就是您所需要的,它从 PHP4 开始就可用。
最后,我很惊讶没有人使用 preg_split(),或者它没有按需要工作?
I feel that its worth noting, that you should always avoid regular expressions when you possibly can. To that end, you should know that for PHP 5.3+ you could use str_getcsv(). However, if you're working with files (or file streams), such as CSV files, then the function fgetcsv() might be what you need, and its been available since PHP4.
Lastly, I'm surprised nobody used preg_split(), or did it not work as needed?
我担心解析嵌套括号会非常困难,例如
一、二、(三、(四、五))
仅适用于正则表达式。
I am afraid that it could be very difficult to parse nested brackets like
one, two, (three, (four, five))
only with RegExp.
在我看来,我们需要一个尊重平衡括号分组的字符串分割算法。 我将使用递归正则表达式模式来解决这个问题! 该行为将尊重最低的平衡括号,并让任何更高级别的不平衡括号被视为非分组字符。 请对任何未正确分割的输入字符串发表评论,以便我可以尝试进行改进(测试驱动开发)。
代码:(Demo)
输出:
这是一个相关的答案,它递归地遍历括号组并反转逗号分隔值的顺序在每个级别上:反转括号内分组文本的顺序并反转括号组的顺序
Sounds to me that we need to have a string splitting algorithm that respects balanced parenthetical grouping. I'll give that a crack using a recursive regex pattern! The behavior will be to respect the lowest balanced parentheticals and let any higher level un-balanced parentheticals be treated as non-grouping characters. Please leave a comment with any input strings that are not correctly split so that I can try to make improvements (test driven development).
Code: (Demo)
Output:
Here's a related answer which recursively traverses parenthetical groups and reverses the order of comma separated values on each level: Reverse the order of parenthetically grouped text and reverse the order of parenthetical groups