从字典中提取数据

发布于 2024-12-20 03:10:36 字数 521 浏览 1 评论 0原文

我有两个制表符分隔的文件,文件 1 包含标识符,文件 2 包含与这些标识符相关的值(或者说它是一个非常大的字典)。

文件 1

Ronny
Rubby
Suzie
Paul

文件 1 只有一列。

文件 2

Alistar Barm Cathy Paul Ronny Rubby Suzie Tom Uma Vai Zai
12      13    14   12     11   11   12    23 30  0.34 0.65
1       4     56   23     12   8.9  5.1   1  4    25  3

文件 2 中存在 n 行。

我想要的是,如果文件 1 的标识符存在于文件 2 中,我应该将与其相关的所有值放在另一个制表符分隔的文件中。

像这样:

Paul Ronny Rubby Suzie
12     11   11   12
23     12   8.9  5.1

提前谢谢您。

I have two tab delimited files, file 1 contains identifiers and file 2 has values related to these identifiers (or say it is a very big dictionary).

file 1

Ronny
Rubby
Suzie
Paul

file 1 has only one column.

file 2

Alistar Barm Cathy Paul Ronny Rubby Suzie Tom Uma Vai Zai
12      13    14   12     11   11   12    23 30  0.34 0.65
1       4     56   23     12   8.9  5.1   1  4    25  3

n number of rows are present in file 2.

what I want, if the identifiers of file 1 are present in file 2, I should have all the values related to it in an another tab delimited file.

Something like this:

Paul Ronny Rubby Suzie
12     11   11   12
23     12   8.9  5.1

Thank you in advance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

浅笑轻吟梦一曲 2024-12-27 03:10:36

注意

您的示例输出不正确,因为您有“Ruby”,但在 file1 示例中您有“Ruby” Ruby =/= Rubby

kent$  awk 'NR==FNR{t[$0]++;next}
{if(FNR==1){
        for(i=1;i<=NF;i++)
                if($i in t){
                        v[i]++;
                        printf $i"\t";
                }
        print "";
        }else{
        for(x in v)
                printf $x"\t"
        print "";
}

}' file1 file2

输出

Paul    Ronny   Suzie
12      11      12
23      12      5.1

NOTE

your example output is NOT correct, since there you have "Ruby" but in your file1 example you had "Rubby" Ruby =/= Rubby

kent$  awk 'NR==FNR{t[$0]++;next}
{if(FNR==1){
        for(i=1;i<=NF;i++)
                if($i in t){
                        v[i]++;
                        printf $i"\t";
                }
        print "";
        }else{
        for(x in v)
                printf $x"\t"
        print "";
}

}' file1 file2

output

Paul    Ronny   Suzie
12      11      12
23      12      5.1
夜空下最亮的亮点 2024-12-27 03:10:36

您只能使用 bash 来完成此操作:

FIELDS=`head -1 f2.txt | tr "\t" "\n" | nl -ba | grep -f f1.txt | cut -f1 | tr -d " " | tr "\n" ","`; FIELDS=${FIELDS/%,/}
cut -f$FIELDS f2.txt 
Paul    Ronny   Ruby    Suzie
12  11  11  12
23  12  8.9 5.1

You can use only bash to do it:

FIELDS=`head -1 f2.txt | tr "\t" "\n" | nl -ba | grep -f f1.txt | cut -f1 | tr -d " " | tr "\n" ","`; FIELDS=${FIELDS/%,/}
cut -f$FIELDS f2.txt 
Paul    Ronny   Ruby    Suzie
12  11  11  12
23  12  8.9 5.1
饮惑 2024-12-27 03:10:36
$ awk 'FILENAME~1{a[$0];next};FNR==1{for(i=1;i<=NF;i++)if($i in a)b[i]};{for(j in b)printf("%s\t",$j);print ""}' file{1,2}.txt
Paul    Ronny   Suzie
12      11      12
23      12      5.1

分成多行 &&添加空格

$ awk '
> FILENAME~1 { a[$0]; next }
> FNR==1 { for(i=1; i<=NF; i++) if($i in a) b[i] }
> { for(j in b) printf("%s\t",$j); print ""}
> ' file{1,2}.txt

Paul    Ronny   Suzie
12      11      12
23      12      5.1
$ awk 'FILENAME~1{a[$0];next};FNR==1{for(i=1;i<=NF;i++)if($i in a)b[i]};{for(j in b)printf("%s\t",$j);print ""}' file{1,2}.txt
Paul    Ronny   Suzie
12      11      12
23      12      5.1

break into multi lines && add whitespace

$ awk '
> FILENAME~1 { a[$0]; next }
> FNR==1 { for(i=1; i<=NF; i++) if($i in a) b[i] }
> { for(j in b) printf("%s\t",$j); print ""}
> ' file{1,2}.txt

Paul    Ronny   Suzie
12      11      12
23      12      5.1
疾风者 2024-12-27 03:10:36

Python 中在流中执行工作的示例(即:在开始输出之前不需要加载完整文件):

# read keys
with open('file1', 'r') as fd:
    keys = fd.read().splitlines()

# output keys
print '\t'.join(keys)

# read data file, with header line and content
with open('file2', 'r') as fd:
    headers = fd.readline().split()
    while True:
        line = fd.readline().split()
        if len(line) == 0:
            break
        print '\t'.join([line[headers.index(x)] for x in keys if x in headers])

输出:

$ python test.py 
Ronny   Ruby    Suzie   Paul
11      11      12      12
12      8.9     5.1     23

An example in Python that does the work in stream (ie: don't need to load the full file before starting the output):

# read keys
with open('file1', 'r') as fd:
    keys = fd.read().splitlines()

# output keys
print '\t'.join(keys)

# read data file, with header line and content
with open('file2', 'r') as fd:
    headers = fd.readline().split()
    while True:
        line = fd.readline().split()
        if len(line) == 0:
            break
        print '\t'.join([line[headers.index(x)] for x in keys if x in headers])

Output:

$ python test.py 
Ronny   Ruby    Suzie   Paul
11      11      12      12
12      8.9     5.1     23
东京女 2024-12-27 03:10:36

Perl解决方案:

#!/usr/bin/perl
use warnings;
use strict;

open my $KEYS, '<', 'file1' or die $!;
my @keys = <$KEYS>;
close $KEYS;
chomp @keys;
my %is_key;
undef @is_key{@keys};

open my $TAB, '<', 'file2' or die $!;
$_ = <$TAB>;
my ($i, @columns);
for (split) {
    push @columns, $i if exists $is_key{$_};
    $i++;
}
do {{
    my @values = split;
    print join("\t", @values[@columns]), "\n";
}} while <$TAB>;

Perl solution:

#!/usr/bin/perl
use warnings;
use strict;

open my $KEYS, '<', 'file1' or die $!;
my @keys = <$KEYS>;
close $KEYS;
chomp @keys;
my %is_key;
undef @is_key{@keys};

open my $TAB, '<', 'file2' or die $!;
$_ = <$TAB>;
my ($i, @columns);
for (split) {
    push @columns, $i if exists $is_key{$_};
    $i++;
}
do {{
    my @values = split;
    print join("\t", @values[@columns]), "\n";
}} while <$TAB>;
寄离 2024-12-27 03:10:36

像这样的东西可能会起作用,具体取决于你想要什么。

use strict;
use warnings;

my %names;
open ( my $nh, '<', $name_file_path ) or die "Could not open '$name_file_path'!";
while ( <$nh> ) { 
    m/^\s*(.*?\S)\s*$/ and $names{ $1 } = 1; 
}
close $nh;

my $coln = -1;
open ( my $dh, '<', $data_file_path ) or die "Could not open '$data_file_path'!";

my ( @name_list, @col_list )
my @names = split /\t/, <$dh>;
foreach my $name ( 0..$#names ) {
    next unless exists $names{ $names[ $name ] };
    push @name_list, $name;
    push @col_list, $coln;
}
local $" = "\t";
print "@name_list\n";
print "@{[ split /\t/ ]}[ @col_list ]\n"  while <$dh>;
close $dh;

Something like this could probably work, depending on what you want.

use strict;
use warnings;

my %names;
open ( my $nh, '<', $name_file_path ) or die "Could not open '$name_file_path'!";
while ( <$nh> ) { 
    m/^\s*(.*?\S)\s*$/ and $names{ $1 } = 1; 
}
close $nh;

my $coln = -1;
open ( my $dh, '<', $data_file_path ) or die "Could not open '$data_file_path'!";

my ( @name_list, @col_list )
my @names = split /\t/, <$dh>;
foreach my $name ( 0..$#names ) {
    next unless exists $names{ $names[ $name ] };
    push @name_list, $name;
    push @col_list, $coln;
}
local $" = "\t";
print "@name_list\n";
print "@{[ split /\t/ ]}[ @col_list ]\n"  while <$dh>;
close $dh;
不乱于心 2024-12-27 03:10:36

这可能对您有用:

 sed '1{s/\t/\n/gp};d' file2 |
 nl |
 grep -f file1 |
 cut -f1 |
 paste -sd, |
 sed 's/ //g;s,.*,cut -f& /tmp/b,' |
 sh

说明:

  1. 对列名称进行透视
  2. 对列名称进行编号
  3. 将列名称与输入文件进行匹配。
  4. 放弃保留列号的列名称。
  5. 旋转由 , 分隔的列号。
  6. 从逗号分隔的列号列表构建 cut 命令。
  7. 对数据文件运行 cut 命令。

This might work for you:

 sed '1{s/\t/\n/gp};d' file2 |
 nl |
 grep -f file1 |
 cut -f1 |
 paste -sd, |
 sed 's/ //g;s,.*,cut -f& /tmp/b,' |
 sh

Explanation:

  1. Pivot the column names
  2. Number the column names
  3. Match the column names against the input file.
  4. Ditch the column names retaining the column numbers.
  5. Pivot the column numbers separating by ,'s.
  6. Build a cut command from the comma separated column number list.
  7. Run the cut command against the data file.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文