Unix 连接两个以上的文件

发布于 2025-01-04 13:50:37 字数 1088 浏览 5 评论 0原文

我有三个文件，每个文件都有一个 ID 和一个值。

sdt5z@fir-s:~/test$ ls
a.txt  b.txt  c.txt
sdt5z@fir-s:~/test$ cat a.txt 
id1 1
id2 2
id3 3
sdt5z@fir-s:~/test$ cat b.txt 
id1 4
id2 5
id3 6
sdt5z@fir-s:~/test$ cat c.txt 
id1 7
id2 8
id3 9

我想创建一个看起来像这样的文件...

id1 1 4 7
id2 2 5 8
id3 3 6 9

...最好使用单个命令。

我知道连接和粘贴命令。粘贴每次都会复制 id 列：

sdt5z@fir-s:~/test$ paste a.txt b.txt c.txt 
id1 1   id1 4   id1 7
id2 2   id2 5   id2 8
id3 3   id3 6   id3 9

连接效果很好，但一次只能用于两个文件：

sdt5z@fir-s:~/test$ join a.txt b.txt 
id1 1 4
id2 2 5
id3 3 6
sdt5z@fir-s:~/test$ join a.txt b.txt c.txt 
join: extra operand `c.txt'
Try `join --help' for more information.

我还知道粘贴可以通过使用“-”将 STDIN 作为参数之一。例如，我可以使用以下命令复制 join 命令：

sdt5z@fir-s:~/test$ cut -f2 b.txt | paste a.txt -
id1 1   4
id2 2   5
id3 3   6

但我仍然不确定如何修改它以容纳三个文件。

因为我是在 perl 脚本中执行此操作，所以我知道我可以执行一些操作，例如将其放入 foreach 循环中，例如 join file1 file2 > tmp1，加入tmp1 file3 > tmp2 等。但这会变得混乱，我想用一行代码来完成此操作。

原文

I have three files, each with an ID and a value.

sdt5z@fir-s:~/test$ ls
a.txt  b.txt  c.txt
sdt5z@fir-s:~/test$ cat a.txt 
id1 1
id2 2
id3 3
sdt5z@fir-s:~/test$ cat b.txt 
id1 4
id2 5
id3 6
sdt5z@fir-s:~/test$ cat c.txt 
id1 7
id2 8
id3 9

I want to create a file that looks like this...

id1 1 4 7
id2 2 5 8
id3 3 6 9

...preferably using a single command.

I'm aware of the join and paste commands. Paste will duplicate the id column each time:

sdt5z@fir-s:~/test$ paste a.txt b.txt c.txt 
id1 1   id1 4   id1 7
id2 2   id2 5   id2 8
id3 3   id3 6   id3 9

Join works well, but for only two files at a time:

sdt5z@fir-s:~/test$ join a.txt b.txt 
id1 1 4
id2 2 5
id3 3 6
sdt5z@fir-s:~/test$ join a.txt b.txt c.txt 
join: extra operand `c.txt'
Try `join --help' for more information.

I'm also aware that paste can take STDIN as one of the arguments by using "-". E.g., I can replicate the join command using:

sdt5z@fir-s:~/test$ cut -f2 b.txt | paste a.txt -
id1 1   4
id2 2   5
id3 3   6

But I'm still not sure how to modify this to accomodate three files.

Since I'm doing this inside a perl script, I know I can do something like putting this inside a foreach loop, something like join file1 file2 > tmp1, join tmp1 file3 > tmp2, etc. But this gets messy, and I would like to do this with a one-liner.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

杀お生予夺 2025-01-11 13:50:37

join a.txt b.txt|join - c.txt

就足够了

回复收藏 0 原文

☆獨立☆ 2025-01-11 13:50:37

既然您在 Perl 脚本内执行此操作，是否有任何特定原因导致您没有在 Perl 中执行此操作而不是在 shell 中执行操作？

像这样的东西（未经测试！买者自负）：

use File::Slurp; # Slurp the files in if they aren't too big
my @files = qw(a.txt b.txt c.txt);
my %file_data = map ($_ => [ read_file($_) ] ) @files;
my @id_orders;
my %data = ();
my $first_file = 1;
foreach my $file (@files) {
    foreach my $line (@{ $file_data{$file} }) {
        my ($id, $value) = split(/\s+/, $line);
        push @id_orders, $id if $first_file;
        $data{$id} ||= [];
        push @{ $data{$id} }, $value;
    }
    $first_file = 0;
}
foreach my $id (@id_orders) {
    print "$d " . join(" ", @{ $data{$id} }) . "\n";
}

Since you're doing it inside a Perl script, is there any specific reason you're NOT doing the work in Perl as opposed to spawning in shell?

Something like (NOT TESTED! caveat emptor):

use File::Slurp; # Slurp the files in if they aren't too big
my @files = qw(a.txt b.txt c.txt);
my %file_data = map ($_ => [ read_file($_) ] ) @files;
my @id_orders;
my %data = ();
my $first_file = 1;
foreach my $file (@files) {
    foreach my $line (@{ $file_data{$file} }) {
        my ($id, $value) = split(/\s+/, $line);
        push @id_orders, $id if $first_file;
        $data{$id} ||= [];
        push @{ $data{$id} }, $value;
    }
    $first_file = 0;
}
foreach my $id (@id_orders) {
    print "$d " . join(" ", @{ $data{$id} }) . "\n";
}

回复收藏 0 原文

ぃ弥猫深巷。 2025-01-11 13:50:37

perl -lanE'$h{$F[0]} .= " $F[1]" END{说 $_.$h{$_} foreach 键 %h}' *.txt

应该可以，但无法测试，因为我是通过手机接听的。如果您在 foreach 和 keys 之间放置 sort，您还可以对输出进行排序。

回复收藏 0 原文

吃素的狼 2025-01-11 13:50:37

pr -m -t -s\  file1.txt file2.txt|gawk '{print $1"\t"$2"\t"$3"\t"$4}'> finalfile.txt

考虑到 file1 和 file2 有 2 列，1 和 2 代表 file1 中的列，3 和 4 代表 file2 中的列。

您还可以通过这种方式打印每个文件中的任何列，并且它将接受任意数量的文件作为输入。例如，如果您的 file1 有 5 列，则 $6 将是 file2 的第一列。

pr -m -t -s\  file1.txt file2.txt|gawk '{print $1"\t"$2"\t"$3"\t"$4}'> finalfile.txt

Considering file1 and file2 have 2 columns and 1 and 2 represents columns from file1 and 3 and 4 represents columns from file2.

You can also print any column from each file in this way and it will take any number of files as input. If your file1 has 5 columns for example, then $6 will be the first column of the file2.

回复收藏 0 原文

~没有更多了~