使用 Perl 从文件中读取部分

发布于 2024-07-26 22:20:35 字数 1484 浏览 3 评论 0原文

我正在尝试从 Perl 中的输入文件读取值。 输入文件如下所示:

1-sampledata1 This is a sample test
              and data for this continues
2-sampledata2 This is sample test 2
              Data for this also is on second line

我想读取上述数据,以便 1-sampledata1 的数据进入 @array12-sampledata2 的数据进入@array2等等。 我将有大约 50 个这样的部分。 例如50-sampledata50

编辑:名称并不总是 X-sampledataX。 例如,我刚刚这样做了。 所以名称不能循环。 我想我必须手动输入它们,

到目前为止我有以下内容(有效)。 但我正在寻找一种更有效的方法来做到这一点。

foreach my $line(@body){
        if ($line=~ /^1-sampledata1\s/){
                $line=~ s/1-ENST0000//g;
                $line=~ s/\s+//g;
                push (@array1, $line);
          #using splitarray because i want to store data as one character each
          #for ex: i wana store 'This' as T H I S in different elements of array
                @splitarray1= split ('',$line);
        last if ($line=~ /2-sampledata2/);
        }
}
foreach my $line(@body){
        if ($line=~ /^2-sampledata2\s/){
                $line=~ s/2-ENSBTAP0//g;
                $line=~ s/\s+//g;
                @splitarray2= split ('',$line);
        last if ($line=~ /3-sampledata3/);
        }
}

正如你所看到的,我每个部分都有不同的数组,每个部分都有不同的 for 循环。 如果我采用到目前为止的方法,那么我最终将得到 50 个 for 循环和 50 个数组。

还有其他更好的方法吗? 最后我确实想得到 50 个数组,但不想编写 50 个 for 循环。 由于我稍后将在程序中循环遍历 50 个数组,也许将它们存储在一个数组中? 我是 Perl 新手,所以它有点令人难以承受......

I am trying to read values from an input file in Perl.
Input file looks like:

1-sampledata1 This is a sample test
              and data for this continues
2-sampledata2 This is sample test 2
              Data for this also is on second line

I want to read the above data so that data for 1-sampledata1 goes into @array1 and data for 2-sampledata2 goes in @array2 and so on.
I will have about 50 sections like this. like 50-sampledata50.

EDIT: The names wont always be X-sampledataX. I just did that for example. So names cant be in a loop. I think I'll have to type them manually

I so far have the following (which works). But I am looking for a more efficient way to do this..

foreach my $line(@body){
        if ($line=~ /^1-sampledata1\s/){
                $line=~ s/1-ENST0000//g;
                $line=~ s/\s+//g;
                push (@array1, $line);
          #using splitarray because i want to store data as one character each
          #for ex: i wana store 'This' as T H I S in different elements of array
                @splitarray1= split ('',$line);
        last if ($line=~ /2-sampledata2/);
        }
}
foreach my $line(@body){
        if ($line=~ /^2-sampledata2\s/){
                $line=~ s/2-ENSBTAP0//g;
                $line=~ s/\s+//g;
                @splitarray2= split ('',$line);
        last if ($line=~ /3-sampledata3/);
        }
}

As you can see I have different arrays for each section and different for loops for each section. If I go with approach I have so far then I will end up with 50 for loops and 50 arrays.

Is there another better way to do this? In the end I do want to end up with 50 arrays but do not want to write 50 for loops. And since I will be looping through the 50 arrays later on in the program, maybe store them in an array? I am new to Perl so its kinda overwhelming ...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

说不完的你爱 2024-08-02 22:20:35

首先要注意的是,您正在尝试使用带有整数后缀的变量名称:不要这样做。 每当你发现自己想要这样做时,就使用数组。 其次,您只需阅读一次即可浏览文件内容,而不是多次。 第三,在 Perl 中通常没有充分的理由将字符串视为字符数组。

更新:此版本的代码使用前导空格的存在来决定要执行的操作。 我也保留以前的版本以供参考。

#!/usr/bin/perl

use strict;
use warnings;

my @data;

while ( my $line = <DATA> ) {
    chomp $line;
    if ( $line =~ s/^ +/ / ) {
        push @{ $data[-1] }, split //, $line;
    }
    else {
        push @data, [ split //, $line ];
    }
}

use Data::Dumper;
print Dumper \@data;

__DATA__
1-sampledata1 This is a sample test
              and data for this continues
2-sampledata2 This is sample test 2
              Data for this also is on second line

以前的版本:

#!/usr/bin/perl

use strict;
use warnings;

my @data;

while ( my $line = <DATA> ) {
    chomp $line;
    $line =~ s/\s+/ /g;
    if ( $line =~ /^[0-9]+-/ ) {
        push @data, [ split //, $line ];
    }
    else {
        push @{ $data[-1] }, split //, $line;
    }
}

use Data::Dumper;
print Dumper \@data;

__DATA__
1-sampledata1 This is a sample test
              and data for this continues
2-sampledata2 This is sample test 2
              Data for this also is on second line

The first thing to notice is that you are trying to use variable names with integer suffixes: Don't. Use an array whenever you find your self wanting to do that. Second, you only need to read to go over the file contents once, not multiple times. Third, there is usually no good reason in Perl to treat a string as an array of characters.

Update: This version of the code uses existence of leading spaces to decide what to do. I am leaving the previous version up as well for reference.

#!/usr/bin/perl

use strict;
use warnings;

my @data;

while ( my $line = <DATA> ) {
    chomp $line;
    if ( $line =~ s/^ +/ / ) {
        push @{ $data[-1] }, split //, $line;
    }
    else {
        push @data, [ split //, $line ];
    }
}

use Data::Dumper;
print Dumper \@data;

__DATA__
1-sampledata1 This is a sample test
              and data for this continues
2-sampledata2 This is sample test 2
              Data for this also is on second line

Previous version:

#!/usr/bin/perl

use strict;
use warnings;

my @data;

while ( my $line = <DATA> ) {
    chomp $line;
    $line =~ s/\s+/ /g;
    if ( $line =~ /^[0-9]+-/ ) {
        push @data, [ split //, $line ];
    }
    else {
        push @{ $data[-1] }, split //, $line;
    }
}

use Data::Dumper;
print Dumper \@data;

__DATA__
1-sampledata1 This is a sample test
              and data for this continues
2-sampledata2 This is sample test 2
              Data for this also is on second line
挽手叙旧 2024-08-02 22:20:35
#! /usr/bin/env perl
use strict;
use warnings;

my %data;
{
  my( $key, $rest );
  while( my $line = <> ){
    unless( ($rest) = $line =~ /^     \s+(.*)/x ){
      ($key, $rest) = $line =~ /^(.*?)\s+(.*)/;
    }
    push @{ $data{$key} }, $rest;
  }
}
#! /usr/bin/env perl
use strict;
use warnings;

my %data;
{
  my( $key, $rest );
  while( my $line = <> ){
    unless( ($rest) = $line =~ /^     \s+(.*)/x ){
      ($key, $rest) = $line =~ /^(.*?)\s+(.*)/;
    }
    push @{ $data{$key} }, $rest;
  }
}
溺ぐ爱和你が 2024-08-02 22:20:35

下面的代码与 @Brad Gilbert 非常相似和 @Sinan Unur 的解决方案:

#!/usr/bin/perl
use strict;
use warnings;    
use Data::Dumper;

my (%arrays, $label);
while (my $line = <DATA>) 
{
    ($label, $line) = ($1, $2) if $line =~ /^(\S+)(.*)/; # new data block

    $line =~ s/^\s+//; # strip whitespaces from the begining
    # append data for corresponding label
    push @{$arrays{$label}}, split('', $line) if defined $label;
}

print $arrays{'1-sampledata1'}[2], "\n";     # 'i'
print join '-', @{$arrays{'2-sampledata2'}}; # 'T-h-i-s- -i-s- -s-a-m-p-l
print Dumper \%arrays;

__DATA__
1-sampledata1 This is a sample test
              and data for this continues
2-sampledata2 This is sample test 2
              Data for this also is on second line

输出

i
T-h-i-s- -i-s- -s-a-m-p-l-e- -t-e-s-t- -2-D-a-t-a- -f-o-r- -t-h-i-s- -a-l-s-o- -i-s- -o-n- -s-e-c-o-n-d- -l-i-n-e-
$VAR1 = {
          '2-sampledata2' => [
                               'T',
                               'h',
                               'i',
                               's',
                               ' ',
                               'i',
                               's',
                               ' ',
                               's',
                               'a',
                               'm',
                               'p',
                               'l',
                               'e',
                               ' ',
                               't',
                               'e',
                               's',
                               't',
                               ' ',
                               '2',
                               'D',
                               'a',
                               't',
                               'a',
                               ' ',
                               'f',
                               'o',
                               'r',
                               ' ',
                               't',
                               'h',
                               'i',
                               's',
                               ' ',
                               'a',
                               'l',
                               's',
                               'o',
                               ' ',
                               'i',
                               's',
                               ' ',
                               'o',
                               'n',
                               ' ',
                               's',
                               'e',
                               'c',
                               'o',
                               'n',
                               'd',
                               ' ',
                               'l',
                               'i',
                               'n',
                               'e',
                               '
'
                             ],
          '1-sampledata1' => [
                               'T',
                               'h',
                               'i',
                               's',
                               ' ',
                               'i',
                               's',
                               ' ',
                               'a',
                               ' ',
                               's',
                               'a',
                               'm',
                               'p',
                               'l',
                               'e',
                               ' ',
                               't',
                               'e',
                               's',
                               't',
                               'a',
                               'n',
                               'd',
                               ' ',
                               'd',
                               'a',
                               't',
                               'a',
                               ' ',
                               'f',
                               'o',
                               'r',
                               ' ',
                               't',
                               'h',
                               'i',
                               's',
                               ' ',
                               'c',
                               'o',
                               'n',
                               't',
                               'i',
                               'n',
                               'u',
                               'e',
                               's',
                               '
'
                             ]
        };

The code below is very similar to @Brad Gilbert's and @Sinan Unur's solutions:

#!/usr/bin/perl
use strict;
use warnings;    
use Data::Dumper;

my (%arrays, $label);
while (my $line = <DATA>) 
{
    ($label, $line) = ($1, $2) if $line =~ /^(\S+)(.*)/; # new data block

    $line =~ s/^\s+//; # strip whitespaces from the begining
    # append data for corresponding label
    push @{$arrays{$label}}, split('', $line) if defined $label;
}

print $arrays{'1-sampledata1'}[2], "\n";     # 'i'
print join '-', @{$arrays{'2-sampledata2'}}; # 'T-h-i-s- -i-s- -s-a-m-p-l
print Dumper \%arrays;

__DATA__
1-sampledata1 This is a sample test
              and data for this continues
2-sampledata2 This is sample test 2
              Data for this also is on second line

Output

i
T-h-i-s- -i-s- -s-a-m-p-l-e- -t-e-s-t- -2-D-a-t-a- -f-o-r- -t-h-i-s- -a-l-s-o- -i-s- -o-n- -s-e-c-o-n-d- -l-i-n-e-
$VAR1 = {
          '2-sampledata2' => [
                               'T',
                               'h',
                               'i',
                               's',
                               ' ',
                               'i',
                               's',
                               ' ',
                               's',
                               'a',
                               'm',
                               'p',
                               'l',
                               'e',
                               ' ',
                               't',
                               'e',
                               's',
                               't',
                               ' ',
                               '2',
                               'D',
                               'a',
                               't',
                               'a',
                               ' ',
                               'f',
                               'o',
                               'r',
                               ' ',
                               't',
                               'h',
                               'i',
                               's',
                               ' ',
                               'a',
                               'l',
                               's',
                               'o',
                               ' ',
                               'i',
                               's',
                               ' ',
                               'o',
                               'n',
                               ' ',
                               's',
                               'e',
                               'c',
                               'o',
                               'n',
                               'd',
                               ' ',
                               'l',
                               'i',
                               'n',
                               'e',
                               '
'
                             ],
          '1-sampledata1' => [
                               'T',
                               'h',
                               'i',
                               's',
                               ' ',
                               'i',
                               's',
                               ' ',
                               'a',
                               ' ',
                               's',
                               'a',
                               'm',
                               'p',
                               'l',
                               'e',
                               ' ',
                               't',
                               'e',
                               's',
                               't',
                               'a',
                               'n',
                               'd',
                               ' ',
                               'd',
                               'a',
                               't',
                               'a',
                               ' ',
                               'f',
                               'o',
                               'r',
                               ' ',
                               't',
                               'h',
                               'i',
                               's',
                               ' ',
                               'c',
                               'o',
                               'n',
                               't',
                               'i',
                               'n',
                               'u',
                               'e',
                               's',
                               '
'
                             ]
        };
坚持沉默 2024-08-02 22:20:35

相反,您应该使用数组的哈希映射。

使用此正则表达式模式获取索引:

/^(\d+)-sampledata(\d+)/

然后,使用 my %arrays 执行:

push($arrays{$index}), $line;

然后,您可以使用 $arrays{$index} 访问数组。

You should, instead, use a hash map to arrays.

Use this regex pattern to get the index:

/^(\d+)-sampledata(\d+)/

And then, with my %arrays, do:

push($arrays{$index}), $line;

You can then access the arrays with $arrays{$index}.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文