在 Perl 正则表达式中匹配 n 个括号

发布于 2024-09-05 18:51:05 字数 558 浏览 6 评论 0原文

我正在 Perl 中解析一些数据,并将在不久的将来添加越来越多不同格式的数据。我想做的是编写一个易于使用的函数,我可以向其传递一个字符串和一个正则表达式,它会返回括号中的任何内容。它将像这样工作(伪代码):

sub parse {
  $data = shift;
  $regex = shift;

  $data =~ eval ("m/$regex/")
  foreach $x ($1...$n)
  {
    push (@ra, $x); 
  }
  return \@ra;
}

然后,我可以这样称呼它:

@subs = parse ($data, '^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)');

正如您所看到的,这段代码存在一些问题。我不知道 eval 是否有效,“foreach”肯定行不通,并且不知道有多少个括号,我不知道要循环多少次。

这对于 split 来说太复杂了,所以如果我忽略了其他功能或可能性,请告诉我。

感谢您的帮助!

I've got some data that I'm parsing in Perl, and will be adding more and more differently formatted data in the near future. What I would like to do is write an easy-to-use function, that I could pass a string and a regex to, and it would return anything in parentheses. It would work something like this (pseudocode):

sub parse {
  $data = shift;
  $regex = shift;

  $data =~ eval ("m/$regex/")
  foreach $x ($1...$n)
  {
    push (@ra, $x); 
  }
  return \@ra;
}

Then, I could call it like this:

@subs = parse ($data, '^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)');

As you can see, there's a couple of issues with this code. I don't know if the eval would work, the 'foreach' definitely wouldn't work, and without knowing how many parentheses there are, I don't know how many times to loop.

This is too complicated for split, so if there's another function or possibility that I'm overlooking, let me know.

Thanks for your help!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

抱猫软卧 2024-09-12 18:51:05

在列表上下文中,正则表达式将返回所有带括号的匹配项的列表。

因此,您所要做的就是:

my @matches = $string =~ /regex (with) (parens)/;

假设它匹配,@matches 将是两个捕获组的数组。

所以使用你的正则表达式:

my @subs = $data =~ /^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)/;

另外,当你有很长的正则表达式时,Perl 有 x 修饰符,它位于结束正则表达式分隔符之后。 x 修饰符允许您在正则表达式中放置空格和换行符以提高可读性。

如果您担心捕获组的长度可能为零,可以通过 @subs = grep {length} @subs 传递匹配项来过滤掉它们。

In list context, a regular expression will return a list of all the parenthesized matches.

So all you have to do is:

my @matches = $string =~ /regex (with) (parens)/;

And assuming that it matched, @matches will be an array of the two capturing groups.

So using your regex:

my @subs = $data =~ /^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)/;

Also, when you have long regexes, Perl has the x modifier, which goes after the closing regex delimiter. The x modifier allows you to put white-space and newlines inside the regex for increased readability.

If you are worried about the capturing groups that might be zero length, you can pass the matches through @subs = grep {length} @subs to filter them out.

没有伤那来痛 2024-09-12 18:51:05

然后,我可以这样称呼它:

@subs = parse($data, 
          '^"([0-9]+)",([^:]*):(\W+):([AZ]{3}[0-9]{5}),ID=([0-9 ]+)');

相反,这样称呼它:

parse($data, 
    qr/^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)/);

此外,如果您可以使用 命名捕获(即 Perl 5.10 及更高版本)。这是一个示例:

#!/usr/bin/perl

use strict; use warnings;

my %re = (
    id => '(?<id> [0-9]+ )',
    name => '(?<name> \w+ )',
    value => '(?<value> [0-9]+ )',
);

my @this = (
    '123,one:12',
    '456,two:21',
);

my @that = (
    'one:[12],123',
    'two:[21],456',
);

my $this_re = qr/$re{id}   ,   $re{name}    : $re{value}/x;
my $that_re = qr/$re{name} : \[$re{value}\] , $re{id}   /x;

use YAML;

for my $d ( @this ) {
    print Dump [ parse($d, $this_re) ];
}

for my $d ( @that ) {
    print Dump [ parse($d, $that_re) ];
}

sub parse {
    my ($d, $re) = @_;
    return unless $d =~ $re;
    return my @result = @+{qw(id name value)};
}

输出:

---
- 123
- one
- 12
---
- 456
- two
- 21
---
- 123
- one
- 12
---
- 456
- two
- 21

Then, I could call it like this:

@subs = parse($data, 
          '^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)');

Instead, call it like:

parse($data, 
    qr/^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)/);

Further, your task would be made simpler if you can use named captures (i.e. Perl 5.10 and later). Here is an example:

#!/usr/bin/perl

use strict; use warnings;

my %re = (
    id => '(?<id> [0-9]+ )',
    name => '(?<name> \w+ )',
    value => '(?<value> [0-9]+ )',
);

my @this = (
    '123,one:12',
    '456,two:21',
);

my @that = (
    'one:[12],123',
    'two:[21],456',
);

my $this_re = qr/$re{id}   ,   $re{name}    : $re{value}/x;
my $that_re = qr/$re{name} : \[$re{value}\] , $re{id}   /x;

use YAML;

for my $d ( @this ) {
    print Dump [ parse($d, $this_re) ];
}

for my $d ( @that ) {
    print Dump [ parse($d, $that_re) ];
}

sub parse {
    my ($d, $re) = @_;
    return unless $d =~ $re;
    return my @result = @+{qw(id name value)};
}

Output:

---
- 123
- one
- 12
---
- 456
- two
- 21
---
- 123
- one
- 12
---
- 456
- two
- 21
花落人断肠 2024-09-12 18:51:05

您正在尝试使用正则表达式解析复杂的表达式 - 这对于完成这项工作来说是一个不够的工具。回想一下,正则表达式无法解析更高的语法。直觉上,任何可能嵌套的表达式都不能用正则表达式进行解析。

You are trying to parse a complex expression with a regex - which is an insufficient tool for the job. Recall that regular expressions cannot parse higher grammars. For intuition, any expression which might be nested cannot be parsed with regex.

凉宸 2024-09-12 18:51:05

当您想查找括号对内的文本时,您需要使用 文本::平衡

但是,这不是您想要做的,所以它对您没有帮助。

When you want to find text inside of pairs of parenthesis, you want to use Text::Balanced.

But, that is not what you want to do, so it will not help you.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文