Perl:使用触发器功能并从读取的块内提取数据

发布于 2024-12-21 18:52:14 字数 972 浏览 4 评论 0原文

我有一个名为 @mytitles 的数组,其中包含很多标题,例如 title1title2 等。我有一个名为“Superdataset”的文件,其中包含与每个标题相关的信息。然而,与 title1 相关的信息可能有 6 行,而与 title2 相关的信息可能有 30 行(随机)。每条信息(对于 titlex)均以“Reading titlex”开头,并以“Done read titlex”结尾。

我需要从每个标题的这些信息行中提取一些数据。我认为幸运的是,我需要的数据每次都在“Done read titlex”之前的两行

所以我的“Superdataset”看起来像:

Reading title1  
 random info line1
 random info line2
 random info line3
 random info line4
 random info line5
 my earnings are 6000
 my expenses are 1000
Done reading title1
Reading title2
 random info line6
 random info line7
 random info line8
 random info line9
 random info line10
 random info line11
 random info line12
 random info line13
 random info line14
 my earnings are 11000
 my expenses are 9000
Done reading title2

我需要一个总和费用和收入总额。有什么建议吗? PS-数组的名称很复杂,而不是像 titlex 这样简单

I have an array called @mytitles which contains a lot of titles such as, say, title1, title2 and so on. I have a file called "Superdataset" which has information pertaining to each title. However, the info related to title1 may be of 6 lines while the info for title2 may be 30 lines (its random). Each piece of information (for a titlex) starts with "Reading titlex" and ends with "Done reading titlex".

From these lines of information of each title, I need to extract some data. I think its lucky that this data I need is in the 2 lines just before "Done reading titlex" each time

So my "Superdataset" looks like:

Reading title1  
 random info line1
 random info line2
 random info line3
 random info line4
 random info line5
 my earnings are 6000
 my expenses are 1000
Done reading title1
Reading title2
 random info line6
 random info line7
 random info line8
 random info line9
 random info line10
 random info line11
 random info line12
 random info line13
 random info line14
 my earnings are 11000
 my expenses are 9000
Done reading title2

I need a total sum of expenses and a total sum of earnings. Any suggestions?
PS-the array has complicated names, not something as simple as titlex

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

岁月流歌 2024-12-28 18:52:14

这是将数据转化为可用形式的第一步。

use warnings;
use strict;
use autodie;

my $input_filename = 'example';
open my $input, '<', $input_filename;
my %data;
{
  my $current_title;

  while(<$input>){
    chomp;
    if( /^Reading (.*?)\s*$/ ){ # start of section
      $current_title = $1;
    }elsif( not defined $current_title ){ # outside of any section
      # invalid data
    }elsif( /^Done reading (.*)/ ){ # end of section
      die if $1 ne $current_title;
      $current_title = undef;
    }else{ # add an element of section to array
      push @{ $data{$current_title} }, $_;
    }
  }
}
close $input;

使用创建的数据结构来确定总收入和支出。

my( $earnings, $expenses );
for my $list( values %data ){
  for( @$list ){
    if( /earnings are (\d+)/ ){
      $earnings += $1;
    }elsif( /expenses are (\d+)/ ){
      $expenses += $1;
    }
  }
}

print "earnings $earnings\n";
print "expenses $expenses\n";

以对计算机更有用的形式打印出来。

use YAML 'Dump';
print Dump \%data;
---
title1:
  - ' random info line1'
  - ' random info line2'
  - ' random info line3'
  - ' random info line4'
  - ' random info line5'
  - ' my earnings are 6000'
  - ' my expenses are 1000'
title2:
  - ' random info line6'
  - ' random info line7'
  - ' random info line8'
  - ' random info line9'
  - ' random info line10'
  - ' random info line11'
  - ' random info line12'
  - ' random info line13'
  - ' random info line14'
  - ' my earnings are 11000'
  - ' my expenses are 9000'

Here is a first pass at slurping the data into a usable form.

use warnings;
use strict;
use autodie;

my $input_filename = 'example';
open my $input, '<', $input_filename;
my %data;
{
  my $current_title;

  while(<$input>){
    chomp;
    if( /^Reading (.*?)\s*$/ ){ # start of section
      $current_title = $1;
    }elsif( not defined $current_title ){ # outside of any section
      # invalid data
    }elsif( /^Done reading (.*)/ ){ # end of section
      die if $1 ne $current_title;
      $current_title = undef;
    }else{ # add an element of section to array
      push @{ $data{$current_title} }, $_;
    }
  }
}
close $input;

Using the created data structure to determine the total earnings, and expenses.

my( $earnings, $expenses );
for my $list( values %data ){
  for( @$list ){
    if( /earnings are (\d+)/ ){
      $earnings += $1;
    }elsif( /expenses are (\d+)/ ){
      $expenses += $1;
    }
  }
}

print "earnings $earnings\n";
print "expenses $expenses\n";

To instead print it out in a form more useful to a computer.

use YAML 'Dump';
print Dump \%data;
---
title1:
  - ' random info line1'
  - ' random info line2'
  - ' random info line3'
  - ' random info line4'
  - ' random info line5'
  - ' my earnings are 6000'
  - ' my expenses are 1000'
title2:
  - ' random info line6'
  - ' random info line7'
  - ' random info line8'
  - ' random info line9'
  - ' random info line10'
  - ' random info line11'
  - ' random info line12'
  - ' random info line13'
  - ' random info line14'
  - ' my earnings are 11000'
  - ' my expenses are 9000'
她说她爱他 2024-12-28 18:52:14

使用“范围”运算符,您可以执行以下操作:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $begin_stanza = qr/^Reading/i;
my $endof_stanza = qr/^Done reading/i;
my ( $title, @lines );
my ( $value, $total_earnings, $total_expenses );
while (<DATA>) {
    chomp;
    if ( m{$begin_stanza} .. m{$endof_stanza} ) {
        if ( m{$begin_stanza\s+(.+)} ) {
            $title = $1;
            @lines = ();
            next;
        }
        if ( m{$endof_stanza} ) {
            ($value) = ( $lines[0] =~ m{(\d+)} );
            $total_earnings += $value;
            ($value) = ( $lines[1] =~ m{(\d+)} );
            $total_expenses += $value;
            print join "\n", $title, @lines, "\n";
            next;
        }
        shift @lines if @lines == 2;
        push  @lines, $_;
    }
}
printf "Total Earnings = %7d\n", $total_earnings;
printf "Total Expenses = %7d\n", $total_expenses;
__DATA__
Reading title1
 random info line1
 random info line2
 random info line3
 random info line4
 random info line5
 my earnings are 6000
 my expenses are 1000
Done reading title1
Reading title2
 random info line6
 random info line7
 random info line8
 random info line9
 random info line10
 random info line11
 random info line12
 random info line13
 random info line14
 my earnings are 11000
 my expenses are 9000
Done reading title2

...这会产生:

title1
 my earnings are 6000
 my expenses are 1000

title2
 my earnings are 11000
 my expenses are 9000

Total Earnings =   17000
Total Expenses =   10000

Using the 'range' operator you could do:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $begin_stanza = qr/^Reading/i;
my $endof_stanza = qr/^Done reading/i;
my ( $title, @lines );
my ( $value, $total_earnings, $total_expenses );
while (<DATA>) {
    chomp;
    if ( m{$begin_stanza} .. m{$endof_stanza} ) {
        if ( m{$begin_stanza\s+(.+)} ) {
            $title = $1;
            @lines = ();
            next;
        }
        if ( m{$endof_stanza} ) {
            ($value) = ( $lines[0] =~ m{(\d+)} );
            $total_earnings += $value;
            ($value) = ( $lines[1] =~ m{(\d+)} );
            $total_expenses += $value;
            print join "\n", $title, @lines, "\n";
            next;
        }
        shift @lines if @lines == 2;
        push  @lines, $_;
    }
}
printf "Total Earnings = %7d\n", $total_earnings;
printf "Total Expenses = %7d\n", $total_expenses;
__DATA__
Reading title1
 random info line1
 random info line2
 random info line3
 random info line4
 random info line5
 my earnings are 6000
 my expenses are 1000
Done reading title1
Reading title2
 random info line6
 random info line7
 random info line8
 random info line9
 random info line10
 random info line11
 random info line12
 random info line13
 random info line14
 my earnings are 11000
 my expenses are 9000
Done reading title2

...which yields:

title1
 my earnings are 6000
 my expenses are 1000

title2
 my earnings are 11000
 my expenses are 9000

Total Earnings =   17000
Total Expenses =   10000
喜爱纠缠 2024-12-28 18:52:14

除非您可以预测相关行之前的行是什么,否则触发器运算符不会通过优化发挥多大作用。我认为使用缓冲区数组会更容易,并且只需匹配收入和支出之后的行即可。

#!/usr/bin/perl
use strict;
use warnings;

my @buffer;
my ($earnings, $expenses);

for my $line (<DATA>) {
    shift @buffer if @buffer > 2;
    push @buffer, $line;

    next if $line !~ /^Done reading/;

    $earnings += $1 if $buffer[0] =~ /(\d+)$/;
    $expenses += $1 if $buffer[1] =~ /(\d+)$/;
}
print "Total earnings: $earnings\n";
print "Total expenses: $expenses\n";

__DATA__
Reading title1  
 random info line1
 random info line2
 random info line3
 random info line4
 random info line5
 my earnings are 6000
 my expenses are 1000
Done reading title1
Reading title2
 random info line6
 random info line7
 random info line8
 random info line9
 random info line10
 random info line11
 random info line12
 random info line13
 random info line14
 my earnings are 11000
 my expenses are 9000
Done reading title2

输出:

Total earnings: 17000
Total expenses: 10000

Unless you can predict what the line before the relevant lines are, the flip-flop operator won't do much good by way of optimization. I think it would be easier to work with a buffer array and just match for the line after the earnings and expenses.

#!/usr/bin/perl
use strict;
use warnings;

my @buffer;
my ($earnings, $expenses);

for my $line (<DATA>) {
    shift @buffer if @buffer > 2;
    push @buffer, $line;

    next if $line !~ /^Done reading/;

    $earnings += $1 if $buffer[0] =~ /(\d+)$/;
    $expenses += $1 if $buffer[1] =~ /(\d+)$/;
}
print "Total earnings: $earnings\n";
print "Total expenses: $expenses\n";

__DATA__
Reading title1  
 random info line1
 random info line2
 random info line3
 random info line4
 random info line5
 my earnings are 6000
 my expenses are 1000
Done reading title1
Reading title2
 random info line6
 random info line7
 random info line8
 random info line9
 random info line10
 random info line11
 random info line12
 random info line13
 random info line14
 my earnings are 11000
 my expenses are 9000
Done reading title2

Output:

Total earnings: 17000
Total expenses: 10000
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文