在 Perl 正则表达式中匹配 n 个括号
我正在 Perl 中解析一些数据,并将在不久的将来添加越来越多不同格式的数据。我想做的是编写一个易于使用的函数,我可以向其传递一个字符串和一个正则表达式,它会返回括号中的任何内容。它将像这样工作(伪代码):
sub parse {
$data = shift;
$regex = shift;
$data =~ eval ("m/$regex/")
foreach $x ($1...$n)
{
push (@ra, $x);
}
return \@ra;
}
然后,我可以这样称呼它:
@subs = parse ($data, '^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)');
正如您所看到的,这段代码存在一些问题。我不知道 eval 是否有效,“foreach”肯定行不通,并且不知道有多少个括号,我不知道要循环多少次。
这对于 split 来说太复杂了,所以如果我忽略了其他功能或可能性,请告诉我。
感谢您的帮助!
I've got some data that I'm parsing in Perl, and will be adding more and more differently formatted data in the near future. What I would like to do is write an easy-to-use function, that I could pass a string and a regex to, and it would return anything in parentheses. It would work something like this (pseudocode):
sub parse {
$data = shift;
$regex = shift;
$data =~ eval ("m/$regex/")
foreach $x ($1...$n)
{
push (@ra, $x);
}
return \@ra;
}
Then, I could call it like this:
@subs = parse ($data, '^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)');
As you can see, there's a couple of issues with this code. I don't know if the eval would work, the 'foreach' definitely wouldn't work, and without knowing how many parentheses there are, I don't know how many times to loop.
This is too complicated for split, so if there's another function or possibility that I'm overlooking, let me know.
Thanks for your help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在列表上下文中,正则表达式将返回所有带括号的匹配项的列表。
因此,您所要做的就是:
假设它匹配,
@matches
将是两个捕获组的数组。所以使用你的正则表达式:
另外,当你有很长的正则表达式时,Perl 有
x
修饰符,它位于结束正则表达式分隔符之后。x
修饰符允许您在正则表达式中放置空格和换行符以提高可读性。如果您担心捕获组的长度可能为零,可以通过 @subs = grep {length} @subs 传递匹配项来过滤掉它们。
In list context, a regular expression will return a list of all the parenthesized matches.
So all you have to do is:
And assuming that it matched,
@matches
will be an array of the two capturing groups.So using your regex:
Also, when you have long regexes, Perl has the
x
modifier, which goes after the closing regex delimiter. Thex
modifier allows you to put white-space and newlines inside the regex for increased readability.If you are worried about the capturing groups that might be zero length, you can pass the matches through
@subs = grep {length} @subs
to filter them out.相反,这样称呼它:
此外,如果您可以使用 命名捕获(即 Perl 5.10 及更高版本)。这是一个示例:
输出:
Instead, call it like:
Further, your task would be made simpler if you can use named captures (i.e. Perl 5.10 and later). Here is an example:
Output:
您正在尝试使用正则表达式解析复杂的表达式 - 这对于完成这项工作来说是一个不够的工具。回想一下,正则表达式无法解析更高的语法。直觉上,任何可能嵌套的表达式都不能用正则表达式进行解析。
You are trying to parse a complex expression with a regex - which is an insufficient tool for the job. Recall that regular expressions cannot parse higher grammars. For intuition, any expression which might be nested cannot be parsed with regex.
当您想查找括号对内的文本时,您需要使用 文本::平衡。
但是,这不是您想要做的,所以它对您没有帮助。
When you want to find text inside of pairs of parenthesis, you want to use Text::Balanced.
But, that is not what you want to do, so it will not help you.