PERL：通过多行匹配迭代

发布于 2025-02-08 12:54:17 字数 5172 浏览 2 评论 0原文

我希望通过perl中的多行模式进行迭代，但我正在努力使用该语法。

我的输入字符串是：


+++    STAR-WARS        2020-01-01 00:00:00+00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

我所需的结果是：

$VAR1 = {
          'In-universe information' => {
                                         'Gender' => 'Male',
                                         'Species' => 'Human',
                                         'results' => '1',
                                         'television series of num' => 'whatever'
                                       },
          'Other attribute' => {
                                 'Affiliation' => [
                                                    'Jedi Order',
                                                    'Galactic Republic',
                                                    'Rebel Alliance'
                                                  ],
                                 'Occupation' => 'Jedi',
                                 'Significant other' => 'Satine Kryze',
                                 'results' => '1'
                               },
          'Personal Details' => {
                                  'Alias' => [
                                               'Padawan',
                                               'Jedi Knight',
                                               'Jedi General',
                                               'Jedi Master'
                                             ],
                                  'First Name' => 'Obi-Wan',
                                  'Last Name' => 'Kenobi',
                                  'Points to other set of information' => 'whatever',
                                  'results' => '1'
                                },
          'code' => '0',
          'description' => 'Operation success'
        };

我对“单个块”的工作非常有效（例如，上面的个人详细信息）。但是，如果数据包含多个块，我将无法弄清楚如何通过每个匹配块迭代。（例如，使用使用 /g）

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

local $/;
my $output = <DATA>;

my %hash;
($hash{'code'}, $hash{'description'}) = $output =~ /^RETCODE = (\d+)\s+(.*)\n/m;

if ($hash{'code'} eq "0") {
    my ($type,$data, $results) = $output =~ /([^\n]+)\n-+\n(.*)\n\n\(Number of results = (\d+)\)\n\n/sm;
    my $previousKey = "";
    while ($data =~ /(.+)$/mg) {
    my $line = $1;
    $line =~ s/(?:^ +)//g;
    my ($key, $value);
    if ($line =~ /^\s*=  /) {
        ($value) = $line =~ /^\s*=  (.*)$/;
        $hash{$type}{$previousKey} = [ $hash{$type}{$previousKey} ] unless ref($hash{$type}{$previousKey});
        push (@{$hash{$type}{$previousKey}}, $value);
    } else {
        ($key, $value) = split(/  =  /, $line);
        $hash{$type}{$key} = $value;
        $previousKey = $key;
    }
    }
    say STDERR Dumper(\%hash);
}

__DATA__

+++    STAR-WARS        2020-01-01 00:00:00+00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

几个事实：

每个“块”总是包含一个标头，其次是newline和dashes等于标头的长度。
每个“块”总是以\ n结束，然后是（结果= \ d+），然后是\ n。
每个键/ value 对在相等符号之前和之后总是有两个空格。 IE /=/
在没有键时，假设是[array]，并将 value 附加到上一个键。例如，在上面的示例中，例如 别名 。
字符串将始终以---结束结尾，然后是\ n

原文

I would like iterate through a multiline pattern in Perl, but I'm struggling with the syntax.

My input string is:


+++    STAR-WARS        2020-01-01 00:00:00+00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

My desired resulting hash would be:

$VAR1 = {
          'In-universe information' => {
                                         'Gender' => 'Male',
                                         'Species' => 'Human',
                                         'results' => '1',
                                         'television series of num' => 'whatever'
                                       },
          'Other attribute' => {
                                 'Affiliation' => [
                                                    'Jedi Order',
                                                    'Galactic Republic',
                                                    'Rebel Alliance'
                                                  ],
                                 'Occupation' => 'Jedi',
                                 'Significant other' => 'Satine Kryze',
                                 'results' => '1'
                               },
          'Personal Details' => {
                                  'Alias' => [
                                               'Padawan',
                                               'Jedi Knight',
                                               'Jedi General',
                                               'Jedi Master'
                                             ],
                                  'First Name' => 'Obi-Wan',
                                  'Last Name' => 'Kenobi',
                                  'Points to other set of information' => 'whatever',
                                  'results' => '1'
                                },
          'code' => '0',
          'description' => 'Operation success'
        };

What I have come up with works well for a "single block" (e.g. Personal Details above). However, if the data contains multiple blocks, I can't figure out how to iterate through every matching block. (e.g. use while with /g)

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;

local $/;
my $output = <DATA>;

my %hash;
($hash{'code'}, $hash{'description'}) = $output =~ /^RETCODE = (\d+)\s+(.*)\n/m;

if ($hash{'code'} eq "0") {
    my ($type,$data, $results) = $output =~ /([^\n]+)\n-+\n(.*)\n\n\(Number of results = (\d+)\)\n\n/sm;
    my $previousKey = "";
    while ($data =~ /(.+)$/mg) {
    my $line = $1;
    $line =~ s/(?:^ +)//g;
    my ($key, $value);
    if ($line =~ /^\s*=  /) {
        ($value) = $line =~ /^\s*=  (.*)$/;
        $hash{$type}{$previousKey} = [ $hash{$type}{$previousKey} ] unless ref($hash{$type}{$previousKey});
        push (@{$hash{$type}{$previousKey}}, $value);
    } else {
        ($key, $value) = split(/  =  /, $line);
        $hash{$type}{$key} = $value;
        $previousKey = $key;
    }
    }
    say STDERR Dumper(\%hash);
}

__DATA__

+++    STAR-WARS        2020-01-01 00:00:00+00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

Few facts:

every "block" always contains a header, followed by newline and dashes equal to the length of the header.
every "block" always ends with \n, followed by (Number of results = \d+), followed by \n.
each key/value pair always have two spaces before and after the equal sign. i.e. / = /
when no key exists, assume it's an [array], and append the value to the previous key. e.g. Alias in my example above.
the string will always ends with --- END followed by a \n

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

恰似旧人归 2025-02-15 12:54:17

根据您的说明，该部分以+++ ...开始，并以--- END结束。

基于此信息，可以用 Regex 将输入传播到感兴趣的块中，然后在循环中单独处理以构建哈希。

注意：解析器进行了稍微修改，并将其放入子例程

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my @shows;

my $data   = do { local $/; <DATA> };
my @blocks = $data =~ /^(\+\+\+ .*?^---    END)/msg;

push @shows, parse($_) for @blocks;

say Dumper(\@shows);

exit 0;

sub parse {
    my $data = shift;

    my(@sections,$re,$r);
    
#   Alternative block to extract show info section
#   $re = qr/^\+\+\+\s+(\S+)\s+(\S+)\s+(\S+)\s+\S+\s+(\S+)\s+%%[^:]+?:\s+([^;]+?);%%\s+RETCODE = (\d+)\s+([^\n]+)/;
#   $r->{info}->@{qw/show day time sw show_name code description/} = $data =~ /$re/;

    $re = qr/RETCODE = (\d+)\s+([^\n]+)/;
    $r->@{qw/code description/} = $data =~ /$re/;
    
    @sections = $data =~ /\n\n(.+?\n-+.*?\(Number of results = \d+\))/gs;

    for my $block ( @sections ) {
        my($section,@lines,$key,$value);
        @lines   = split("\n",$block);
        $section = $lines[0];
        for my $line (@lines[2..$#lines-2] ) {
            $line =~ s/^\s+//;
            if( $line =~ /^=\s+(.+)/ ) {
                $r->{$section}{$key} = [ $r->{$section}{$key} ] unless ref $r->{$section}{$key} eq 'ARRAY';
                push @{$r->{$section}{$key}}, $1;
            } else {
                ($key,$value) = split(/  =  /,$line);
                $r->{$section}{$key} = $value; 
            }
        }
        $r->{$section}{results} = $block =~ /\(Number of results = (\d+)\)/gs;
    }
    
    return $r;
}

__DATA__

+++    STAR-WARS        2020-01-01 00:00:00+00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

+++    STAR-WARS        2020-01-01 00:00:00+00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

输出中

$VAR1 = [
          {
            'Other attribute' => {
                                   'Significant other' => 'Satine Kryze',
                                   'Occupation' => 'Jedi',
                                   'results' => 1,
                                   'Affiliation' => [
                                                      'Jedi Order',
                                                      'Galactic Republic',
                                                      'Rebel Alliance'
                                                    ]
                                 },
            'Personal Details' => {
                                    'results' => 1,
                                    'First Name' => 'Obi-Wan',
                                    'Alias' => [
                                                 'Padawan',
                                                 'Jedi Knight',
                                                 'Jedi General',
                                                 'Jedi Master'
                                               ],
                                    'Points to other set of information' => 'whatever',
                                    'Last Name' => 'Kenobi'
                                  },
            'code' => '0',
            'description' => 'Operation success',
            'In-universe information' => {
                                           'television series of num' => 'whatever',
                                           'Gender' => 'Male',
                                           'results' => 1,
                                           'Species' => 'Human'
                                         }
          },
          {
            'Other attribute' => {
                                   'Affiliation' => [
                                                      'Jedi Order',
                                                      'Galactic Republic',
                                                      'Rebel Alliance'
                                                    ],
                                   'results' => 1,
                                   'Significant other' => 'Satine Kryze',
                                   'Occupation' => 'Jedi'
                                 },
            'Personal Details' => {
                                    'First Name' => 'Obi-Wan',
                                    'results' => 1,
                                    'Last Name' => 'Kenobi',
                                    'Alias' => [
                                                 'Padawan',
                                                 'Jedi Knight',
                                                 'Jedi General',
                                                 'Jedi Master'
                                               ],
                                    'Points to other set of information' => 'whatever'
                                  },
            'code' => '0',
            'description' => 'Operation success',
            'In-universe information' => {
                                           'television series of num' => 'whatever',
                                           'results' => 1,
                                           'Gender' => 'Male',
                                           'Species' => 'Human'
                                         }
          }
        ];

According your description the section is starting with +++ ... and ending with --- END.

Based on this information the input can be devided with regex into blocks of interest which then processed individually in a loop with a parser to build a hash.

NOTE: the parser was slightly modified and put into subroutine

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my @shows;

my $data   = do { local $/; <DATA> };
my @blocks = $data =~ /^(\+\+\+ .*?^---    END)/msg;

push @shows, parse($_) for @blocks;

say Dumper(\@shows);

exit 0;

sub parse {
    my $data = shift;

    my(@sections,$re,$r);
    
#   Alternative block to extract show info section
#   $re = qr/^\+\+\+\s+(\S+)\s+(\S+)\s+(\S+)\s+\S+\s+(\S+)\s+%%[^:]+?:\s+([^;]+?);%%\s+RETCODE = (\d+)\s+([^\n]+)/;
#   $r->{info}->@{qw/show day time sw show_name code description/} = $data =~ /$re/;

    $re = qr/RETCODE = (\d+)\s+([^\n]+)/;
    $r->@{qw/code description/} = $data =~ /$re/;
    
    @sections = $data =~ /\n\n(.+?\n-+.*?\(Number of results = \d+\))/gs;

    for my $block ( @sections ) {
        my($section,@lines,$key,$value);
        @lines   = split("\n",$block);
        $section = $lines[0];
        for my $line (@lines[2..$#lines-2] ) {
            $line =~ s/^\s+//;
            if( $line =~ /^=\s+(.+)/ ) {
                $r->{$section}{$key} = [ $r->{$section}{$key} ] unless ref $r->{$section}{$key} eq 'ARRAY';
                push @{$r->{$section}{$key}}, $1;
            } else {
                ($key,$value) = split(/  =  /,$line);
                $r->{$section}{$key} = $value; 
            }
        }
        $r->{$section}{results} = $block =~ /\(Number of results = (\d+)\)/gs;
    }
    
    return $r;
}

__DATA__

+++    STAR-WARS        2020-01-01 00:00:00+00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

+++    STAR-WARS        2020-01-01 00:00:00+00:00
S&W    #00000000
%%SHOW NAME: Q=Kenobi;%%
RETCODE = 0  Operation success

In-universe information
-----------------------
                 Species  =  Human
                  Gender  =  Male
television series of num  =  whatever

(Number of results = 1)

Personal Details
----------------
                        First Name  =  Obi-Wan
                         Last Name  =  Kenobi
                             Alias  =  Padawan
                                    =  Jedi Knight
                                    =  Jedi General
                                    =  Jedi Master
Points to other set of information  =  whatever

(Number of results = 1)

Other attribute
---------------
Significant other  =  Satine Kryze
      Affiliation  =  Jedi Order
                   =  Galactic Republic
                   =  Rebel Alliance
       Occupation  =  Jedi

(Number of results = 1)

---    END

Output

$VAR1 = [
          {
            'Other attribute' => {
                                   'Significant other' => 'Satine Kryze',
                                   'Occupation' => 'Jedi',
                                   'results' => 1,
                                   'Affiliation' => [
                                                      'Jedi Order',
                                                      'Galactic Republic',
                                                      'Rebel Alliance'
                                                    ]
                                 },
            'Personal Details' => {
                                    'results' => 1,
                                    'First Name' => 'Obi-Wan',
                                    'Alias' => [
                                                 'Padawan',
                                                 'Jedi Knight',
                                                 'Jedi General',
                                                 'Jedi Master'
                                               ],
                                    'Points to other set of information' => 'whatever',
                                    'Last Name' => 'Kenobi'
                                  },
            'code' => '0',
            'description' => 'Operation success',
            'In-universe information' => {
                                           'television series of num' => 'whatever',
                                           'Gender' => 'Male',
                                           'results' => 1,
                                           'Species' => 'Human'
                                         }
          },
          {
            'Other attribute' => {
                                   'Affiliation' => [
                                                      'Jedi Order',
                                                      'Galactic Republic',
                                                      'Rebel Alliance'
                                                    ],
                                   'results' => 1,
                                   'Significant other' => 'Satine Kryze',
                                   'Occupation' => 'Jedi'
                                 },
            'Personal Details' => {
                                    'First Name' => 'Obi-Wan',
                                    'results' => 1,
                                    'Last Name' => 'Kenobi',
                                    'Alias' => [
                                                 'Padawan',
                                                 'Jedi Knight',
                                                 'Jedi General',
                                                 'Jedi Master'
                                               ],
                                    'Points to other set of information' => 'whatever'
                                  },
            'code' => '0',
            'description' => 'Operation success',
            'In-universe information' => {
                                           'television series of num' => 'whatever',
                                           'results' => 1,
                                           'Gender' => 'Male',
                                           'Species' => 'Human'
                                         }
          }
        ];

回复收藏 0 原文

~没有更多了~