按列的值将多行合并为单行

发布于 2024-11-15 08:12:08 字数 243 浏览 1 评论 0原文

我有一个非常大的制表符分隔文本文件。文件中的许多行对于文件中的一列具有相同的值。我想把它们放在同一行。例如:

a foo
a bar
a foo2
b bar
c bar2

运行脚本后,它应该变成:

a foo;bar;foo2
b bar
c bar2

如何在 shell 脚本或 Python 中执行此操作?

谢谢。

I have a tab-delimited text file that is very large. Many lines in the file have the same value for one of the columns in the file. I want to put them into same line. For example:

a foo
a bar
a foo2
b bar
c bar2

After run the script it should become:

a foo;bar;foo2
b bar
c bar2

how can I do this in either a shell script or in Python?

thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

我纯我任性 2024-11-22 08:12:08

使用 awk,您可以尝试这个

{   a[$1] = a[$1] ";" $2 }
END { for (item in a ) print item, a[item] }

因此,如果您将此 awk 脚本保存在名为 awkf.awk 的文件中,并且如果您的输入文件是 ifile.txt,请运行该脚本

awk -f awkf.awk ifile.txt | sed 's/ ;/ /'

sed 脚本将删除前导 ;

希望这有帮助

With awk you can try this

{   a[$1] = a[$1] ";" $2 }
END { for (item in a ) print item, a[item] }

So if you save this awk script in a file called awkf.awk and if your input file is ifile.txt, run the script

awk -f awkf.awk ifile.txt | sed 's/ ;/ /'

The sed script is to remove out the leading ;

Hope this helps

北座城市 2024-11-22 08:12:08
from collections import defaultdict

items = defaultdict(list)
for line in open('sourcefile'):
    key, val = line.split('\t')
    items[key].append(val)

result = open('result', 'w')
for k in sorted(items):
    result.write('%s\t%s\n' % (k, ';'.join(items[k])))
result.close()  

未测试

from collections import defaultdict

items = defaultdict(list)
for line in open('sourcefile'):
    key, val = line.split('\t')
    items[key].append(val)

result = open('result', 'w')
for k in sorted(items):
    result.write('%s\t%s\n' % (k, ';'.join(items[k])))
result.close()  

not tested

听,心雨的声音 2024-11-22 08:12:08

使用 Python 2.7 测试:

import csv

data = {}

reader = csv.DictReader(open('infile','r'),fieldnames=['key','value'],delimiter='\t')
for row in reader:
    if row['key'] in data:
        data[row['key']].append(row['value'])
    else:
        data[row['key']] = [row['value']]

writer = open('outfile','w')
for key in data:
    writer.write(key + '\t' + ';'.join(data[key]) + '\n')
writer.close()

Tested with Python 2.7:

import csv

data = {}

reader = csv.DictReader(open('infile','r'),fieldnames=['key','value'],delimiter='\t')
for row in reader:
    if row['key'] in data:
        data[row['key']].append(row['value'])
    else:
        data[row['key']] = [row['value']]

writer = open('outfile','w')
for key in data:
    writer.write(key + '\t' + ';'.join(data[key]) + '\n')
writer.close()
儭儭莪哋寶赑 2024-11-22 08:12:08

Perl 方法:

#!/usr/bin/perl 
use strict;
use warnings;
use Data::Dumper;

open my $fh, '<', 'path/to/file' or die "unable to open file:$!";
my %res;
while(<$fh>) {
    my ($k, $v) = split;
    push @{$res{$k}}, $v;
}
print Dumper \%res;

输出:

$VAR1 = {
      'c' => [
               'bar2'
             ],
      'a' => [
               'foo',
               'bar',
               'foo2'
             ],
      'b' => [
               'bar'
             ]
    };

A Perl way to do it:

#!/usr/bin/perl 
use strict;
use warnings;
use Data::Dumper;

open my $fh, '<', 'path/to/file' or die "unable to open file:$!";
my %res;
while(<$fh>) {
    my ($k, $v) = split;
    push @{$res{$k}}, $v;
}
print Dumper \%res;

output:

$VAR1 = {
      'c' => [
               'bar2'
             ],
      'a' => [
               'foo',
               'bar',
               'foo2'
             ],
      'b' => [
               'bar'
             ]
    };
情感失落者 2024-11-22 08:12:08
#! /usr/bin/env perl

use strict;
use warnings;

# for demo only
*ARGV = *DATA;

my %record;
my @order;
while (<>) {
  chomp;
  my($key,$combine) = split;

  push @order, $key unless exists $record{$key};
  push @{ $record{$key} }, $combine;
}

print $_, "\t", join(";", @{ $record{$_} }), "\n" for @order;

__DATA__
a foo
a bar
a foo2
b bar
c bar2

输出(制表符转换为空格,因为 Stack Overflow 破坏了输出):

a       foo;bar;foo2
b       bar
c       bar2
#! /usr/bin/env perl

use strict;
use warnings;

# for demo only
*ARGV = *DATA;

my %record;
my @order;
while (<>) {
  chomp;
  my($key,$combine) = split;

  push @order, $key unless exists $record{$key};
  push @{ $record{$key} }, $combine;
}

print $_, "\t", join(";", @{ $record{$_} }), "\n" for @order;

__DATA__
a foo
a bar
a foo2
b bar
c bar2

Output (with tabs converted to spaces because Stack Overflow breaks the output):

a       foo;bar;foo2
b       bar
c       bar2
冰之心 2024-11-22 08:12:08
def compress(infilepath, outfilepath):
    input = open(infilepath, 'r')
    output = open(outfilepath, 'w')
    prev_index = None

    for line in input:
        index, val = line.split('\t')
        if index == prev_index:
            output.write(";%s" %val)
        else:
            output.write("\n%s %s" %(index, val))
    input.close()
    output.close()

未经测试,但应该可以。如有任何疑问,请发表评论

def compress(infilepath, outfilepath):
    input = open(infilepath, 'r')
    output = open(outfilepath, 'w')
    prev_index = None

    for line in input:
        index, val = line.split('\t')
        if index == prev_index:
            output.write(";%s" %val)
        else:
            output.write("\n%s %s" %(index, val))
    input.close()
    output.close()

Untested, but should work. Please leave a comment if there are any concerns

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文