从表中提取特定数据

发布于 2024-12-19 23:17:53 字数 339 浏览 0 评论 0原文

我有一个看起来像这样的表(制表符分隔):

Ron  Rob  rock bammy
m    f   m  f
florida  Atlanta  florida texas 

该表的大小为 5*512,基于第 3 行数据,我想提取 row1 中的值。 例如:我想要在一个 2 列 n 行的表中包含居住在佛罗里达州和德克萨斯州的所有人的姓名。

Florida  Ron
Florida  Rock
Texas BAmmy

等等。

关于 bash 或 PERL 衬垫的任何建议...

提前谢谢您。

I have table which looks like this (tab separated):

Ron  Rob  rock bammy
m    f   m  f
florida  Atlanta  florida texas 

This table is of order 5*512 and based on row 3 data, I want to extract the values in row1.
for example: I want to have names of all person living in florida and texas, in a table of 2 columns and n number of rows.

Florida  Ron
Florida  Rock
Texas BAmmy

and so on.

any suggestions for a bash or PERL liners...

Thank you in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

誰認得朕 2024-12-26 23:17:53
awk 'NR==1{for(i=1;i<=NF;i++)n[i]=$i}; NR==3{for(i=1;i<=NF;i++){if($i~/florida|texas/)print $i"\t"n[i];}}' yourFile

请参阅下面的测试:

kent$  echo "Ron Rob rock bammy
m f m f
florida Atlanta florida texas"|awk 'NR==1{for(i=1;i<=NF;i++)n[i]=$i}; NR==3{for(i=1;i<=NF;i++){if($i~/florida|texas/)print $i"\t"n[i];}}'

输出

florida Ron
florida rock
texas   bammy

编辑

kent$  echo "Ron  Rob  rock bammy
m    f   m  f
florida(8)  Atlanta  florida(8) texas(2;7)"|awk 'NR==1{for(i=1;i<=NF;i++)n[i]=$i}; NR==3{for(i=1;i<=NF;i++){if($i~/florida\(8\)|texas\(2;7\)/)print $i"\t"n[i];}}'

输出:

florida(8)      Ron
florida(8)      rock
texas(2;7)      bammy
awk 'NR==1{for(i=1;i<=NF;i++)n[i]=$i}; NR==3{for(i=1;i<=NF;i++){if($i~/florida|texas/)print $i"\t"n[i];}}' yourFile

see the test below:

kent$  echo "Ron Rob rock bammy
m f m f
florida Atlanta florida texas"|awk 'NR==1{for(i=1;i<=NF;i++)n[i]=$i}; NR==3{for(i=1;i<=NF;i++){if($i~/florida|texas/)print $i"\t"n[i];}}'

output

florida Ron
florida rock
texas   bammy

EDIT

kent$  echo "Ron  Rob  rock bammy
m    f   m  f
florida(8)  Atlanta  florida(8) texas(2;7)"|awk 'NR==1{for(i=1;i<=NF;i++)n[i]=$i}; NR==3{for(i=1;i<=NF;i++){if($i~/florida\(8\)|texas\(2;7\)/)print $i"\t"n[i];}}'

output:

florida(8)      Ron
florida(8)      rock
texas(2;7)      bammy
白云不回头 2024-12-26 23:17:53

另一种 Perl 解决方案:

perl -ane 'push@c,@F}{print grep{/^(florida|atlanta)\t/i}map{"$c[$_+$#c/3*2+1]\t$c[$_]\n"}0..$#c/3'

脚本,

#!/usr/bin/perl

use strict;
use warnings;

my (@data, @rows);

push @data, split/\s+/ while (<>);

for (0 .. $#data/3) {
    my $name = $data[$_];
    my $location = $data[$_+$#data/3*2+1];
    push @rows, "$location\t$name\n" if $location =~ /^(florida|atlanta)$/i;
}

print join("", @rows);

或者作为循环内带有 if 条件的

而不是单独的 grep。我的方法是将所有三行展平为一个数组,并使用 for (0 .. $#data/3) 循环遍历与第一行中的名称相对应的索引并从中获取位置与 $data[$_+$#data/3*2+1] 匹配的列。

Yet another Perl solution:

perl -ane 'push@c,@F}{print grep{/^(florida|atlanta)\t/i}map{"$c[$_+$#c/3*2+1]\t$c[$_]\n"}0..$#c/3'

Or as a script

#!/usr/bin/perl

use strict;
use warnings;

my (@data, @rows);

push @data, split/\s+/ while (<>);

for (0 .. $#data/3) {
    my $name = $data[$_];
    my $location = $data[$_+$#data/3*2+1];
    push @rows, "$location\t$name\n" if $location =~ /^(florida|atlanta)$/i;
}

print join("", @rows);

with an if condition inside the loop instead of the separate grep.

My approach is to flatten all three lines into a single array and use for (0 .. $#data/3) to loop over the indexes corresponding to the names from the first line and get the location from the matching column with $data[$_+$#data/3*2+1].

眉目亦如画i 2024-12-26 23:17:53

这是一个可行的 Perl 解决方案,但它比我想要的要复杂一些。您最好将这些数据放入数据库中。

#!/usr/bin/env perl

use strict;
use warnings;
use 5.010;

my %rows = (
  name => 1,
  location => 3,
);

my %location = map { $_ => 1 } qw[florida texas];

my @names;

while (<DATA>) {
  next unless grep { $_ == $. } values %rows;

  chomp;

  if ($. == $rows{name}) {
    @names = split;
  }

  if ($. == $rows{location}) {
    my @locs = split;

    for my $x (0 .. $#locs) {
      if ($location{lc $locs[$x]}) {
        say ucfirst $locs[$x]. "\t$names[$x]";
      }
    }
    last;
  }
}

__END__
Ron     Rob     rock    bammy
m       f       m       f
florida         Atlanta florida texas

Here's a Perl solution that works, but it's a bit more convoluted that I'd like. You'd probably be better off putting this data into a database.

#!/usr/bin/env perl

use strict;
use warnings;
use 5.010;

my %rows = (
  name => 1,
  location => 3,
);

my %location = map { $_ => 1 } qw[florida texas];

my @names;

while (<DATA>) {
  next unless grep { $_ == $. } values %rows;

  chomp;

  if ($. == $rows{name}) {
    @names = split;
  }

  if ($. == $rows{location}) {
    my @locs = split;

    for my $x (0 .. $#locs) {
      if ($location{lc $locs[$x]}) {
        say ucfirst $locs[$x]. "\t$names[$x]";
      }
    }
    last;
  }
}

__END__
Ron     Rob     rock    bammy
m       f       m       f
florida         Atlanta florida texas
夕色琉璃 2024-12-26 23:17:53

在我看来,这是 Text::CSV_XS 的工作。正如许多人似乎建议的那样,在空白处进行分割不是一个好主意,因为除了简化数据之外,这对于任何其他东西都会失败。

代码:

use strict;
use warnings;
use Text::CSV_XS;

my $csv = Text::CSV_XS->new( {
        sep_char    => "\t",
        binary      => 1,
    });

# get array refs to each row, with appropriate name
# For larger data sets, using an array to hold the array refs would be better
my $name       = $csv->getline(*DATA);
my $gender     = $csv->getline(*DATA);
my $city       = $csv->getline(*DATA);

for (keys @$city) {   # lists the column numbers
    if ($city->[$_] =~ /florida|texas/i) {
        print "$city->[$_]\t$name->[$_]\n";
    }
}

__DATA__
Ron Rob rock    bammy
m   f   m   f
florida Atlanta florida texas

输出:

florida Ron
florida rock
texas   bammy

Sounds to me like this is a job for Text::CSV_XS. It is not a good idea to split on whitespace, as many seem to be suggesting, as that will fail for anything but simplified data.

Code:

use strict;
use warnings;
use Text::CSV_XS;

my $csv = Text::CSV_XS->new( {
        sep_char    => "\t",
        binary      => 1,
    });

# get array refs to each row, with appropriate name
# For larger data sets, using an array to hold the array refs would be better
my $name       = $csv->getline(*DATA);
my $gender     = $csv->getline(*DATA);
my $city       = $csv->getline(*DATA);

for (keys @$city) {   # lists the column numbers
    if ($city->[$_] =~ /florida|texas/i) {
        print "$city->[$_]\t$name->[$_]\n";
    }
}

__DATA__
Ron Rob rock    bammy
m   f   m   f
florida Atlanta florida texas

Output:

florida Ron
florida rock
texas   bammy
逆光下的微笑 2024-12-26 23:17:53
#!/usr/bin/env perl
use strict;
use warnings;

my $pat = shift;

sub interleave($){
    my ($foo,$bar) = @_;
    return map { ( $_ , shift @{$bar} ) } @{$foo};
}

my $n=0;
my(@p,%h);
while(<>){
    chomp;
    if($n%3==0){
            @p = split /\t/, $_;
    } elsif($n%3==2){
            my @l = split /\t/, $_;
            my %kv = interleave(\@p, \@l);
            foreach my $k (keys %kv){
                    push(@{$h{$kv{$k}}}, $k);
            }
    }
    $n++;
}

foreach my $loc (keys %h){
    if(!defined $pat || $loc =~ /$pat/i){
            foreach my $name (@{$h{$loc}}){
                    print ucfirst($loc), "\t", ucfirst($name), "\n";
            }
    }
}

然后将其称为

perl extract.pl 'texas|florida' < data

“Oneliner”形式:

perl -ne 'BEGIN{$p=shift||"^";}chomp;if($n++%3!=1){unless(@p){@p=split/\t/,$_;next;}my %kv = map { ( $_ , shift @p ) } split(/\t/, $_);map { push(@{$h{$_}}, $kv{$_}); } keys %kv;}END{map{for my$nm(@{$h{$_}}){print ucfirst($_),"\t",ucfirst($nm),"\n";}}grep{/$p/i}keys%h;}' 'florida|texas' < data
#!/usr/bin/env perl
use strict;
use warnings;

my $pat = shift;

sub interleave($){
    my ($foo,$bar) = @_;
    return map { ( $_ , shift @{$bar} ) } @{$foo};
}

my $n=0;
my(@p,%h);
while(<>){
    chomp;
    if($n%3==0){
            @p = split /\t/, $_;
    } elsif($n%3==2){
            my @l = split /\t/, $_;
            my %kv = interleave(\@p, \@l);
            foreach my $k (keys %kv){
                    push(@{$h{$kv{$k}}}, $k);
            }
    }
    $n++;
}

foreach my $loc (keys %h){
    if(!defined $pat || $loc =~ /$pat/i){
            foreach my $name (@{$h{$loc}}){
                    print ucfirst($loc), "\t", ucfirst($name), "\n";
            }
    }
}

And then call it

perl extract.pl 'texas|florida' < data

"Oneliner" form:

perl -ne 'BEGIN{$p=shift||"^";}chomp;if($n++%3!=1){unless(@p){@p=split/\t/,$_;next;}my %kv = map { ( $_ , shift @p ) } split(/\t/, $_);map { push(@{$h{$_}}, $kv{$_}); } keys %kv;}END{map{for my$nm(@{$h{$_}}){print ucfirst($_),"\t",ucfirst($nm),"\n";}}grep{/$p/i}keys%h;}' 'florida|texas' < data
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文