交换键和数组值对

发布于 2024-07-15 07:04:09 字数 366 浏览 7 评论 0原文

我有一个这样布局的文本文件：

1   a, b, c
2   c, b, c
2.5 a, c

我想反转键（数字）和值（CSV）（它们由制表符分隔）以生成以下内容：（

a   1, 2.5
b   1, 2
c   1, 2, 2.5

注意 2 对于 c 来说如何不重复））

我不需要这个确切的输出。输入中的数字是有序的，而值则不是。输出的键以及值都必须排序。

我怎样才能做到这一点？我可以访问标准 shell 实用程序（awk、sed、grep...）和 GCC。如果需要的话，我可能可以获取其他语言的编译器/解释器。

原文

I have a text file layed out like this:

1   a, b, c
2   c, b, c
2.5 a, c

I would like to reverse the keys (the number) and values (CSV) (they are separated by a tab character) to produce this:

a   1, 2.5
b   1, 2
c   1, 2, 2.5

(Notice how 2 isn't duplicated for c.)

I do not need this exact output. The numbers in the input are ordered, while the values are not. The output's keys must be ordered, as well as the values.

How can I do this? I have access to standard shell utilities (awk, sed, grep...) and GCC. I can probably grab a compiler/interpreter for other languages if needed.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

野味少女 2024-07-22 07:04:09

如果你有Python（如果你在Linux上你可能已经有）我会使用一个简短的Python脚本来做到这一点。请注意，我们使用集合来过滤掉“双”项。

编辑以更接近请求者的要求：

import csv
from decimal import * 
getcontext().prec = 7

csv_reader = csv.reader(open('test.csv'), delimiter='\t')

maindict = {}
for row in csv_reader:
    value = row[0]
    for key in row[1:]:
        try:
            maindict[key].add(Decimal(value))
        except KeyError:
            maindict[key] = set()
        maindict[key].add(Decimal(value))

csv_writer = csv.writer(open('out.csv', 'w'), delimiter='\t')

sorted_keys = [x[1] for x in sorted([(x.lower(), x) for x in maindict.keys()])]
for key in sorted_keys:
    csv_writer.writerow([key] + sorted(maindict[key]))

If you have python (if you're on linux you probably already have) i'd use a short python script to do this. Note that we use sets to filter out "double" items.

Edited to be closer to requester's requirements:

import csv
from decimal import * 
getcontext().prec = 7

csv_reader = csv.reader(open('test.csv'), delimiter='\t')

maindict = {}
for row in csv_reader:
    value = row[0]
    for key in row[1:]:
        try:
            maindict[key].add(Decimal(value))
        except KeyError:
            maindict[key] = set()
        maindict[key].add(Decimal(value))

csv_writer = csv.writer(open('out.csv', 'w'), delimiter='\t')

sorted_keys = [x[1] for x in sorted([(x.lower(), x) for x in maindict.keys()])]
for key in sorted_keys:
    csv_writer.writerow([key] + sorted(maindict[key]))

回复收藏 0 原文

明天过后 2024-07-22 07:04:09

如果你可以的话我会尝试 perl。一次循环输入一行。在制表符上分割线，然后在逗号上分割右侧部分。将值推入一个关联数组，以字母为键，值作为另一个关联数组。第二个关联数组将充当集合的一部分，以消除重复项。

读取输入文件后，根据关联数组的键进行排序，循环并输出结果。

回复收藏 0 原文

若水微香 2024-07-22 07:04:09

这是 php 中的一个小实用程序：

// load and parse the input file
$data = file("path/to/file/");
foreach ($data as $line) {
    list($num, $values) = explode("\t", $line);
    $newData["$num"] = explode(", ", trim($values));
}
unset($data);

// reverse the index/value association
foreach ($newData as $index => $values) {
    asort($values);
    foreach($values as $value) {
        if (!isset($data[$value]))
            $data[$value] = array();
        if (!in_array($index, $data[$value]))
            array_push($data[$value], $index);
    }
}

// printout the result
foreach ($data as $index => $values) {
    echo "$index\t" . implode(", ", $values) . "\n";
}

没有真正优化或好看，但它可以工作......

here's a small utility in php:

// load and parse the input file
$data = file("path/to/file/");
foreach ($data as $line) {
    list($num, $values) = explode("\t", $line);
    $newData["$num"] = explode(", ", trim($values));
}
unset($data);

// reverse the index/value association
foreach ($newData as $index => $values) {
    asort($values);
    foreach($values as $value) {
        if (!isset($data[$value]))
            $data[$value] = array();
        if (!in_array($index, $data[$value]))
            array_push($data[$value], $index);
    }
}

// printout the result
foreach ($data as $index => $values) {
    echo "$index\t" . implode(", ", $values) . "\n";
}

not really optimized or good looking, but it works...

回复收藏 0 原文

朱染 2024-07-22 07:04:09

# use Modern::Perl;
use strict;
use warnings;
use feature qw'say';


our %data;

while(<>){
  chomp;
  my($number,$csv) = split /\t/;
  my @csv = split m"\s*,\s*", $csv;
  push @{$data{$_}}, $number for @csv;
}

for my $number (sort keys %data){
  my @unique = sort keys %{{ map { ($_,undef) } @{$data{$number}} }};
  say $number, "\t", join ', ', @unique;
}

# use Modern::Perl;
use strict;
use warnings;
use feature qw'say';


our %data;

while(<>){
  chomp;
  my($number,$csv) = split /\t/;
  my @csv = split m"\s*,\s*", $csv;
  push @{$data{$_}}, $number for @csv;
}

for my $number (sort keys %data){
  my @unique = sort keys %{{ map { ($_,undef) } @{$data{$number}} }};
  say $number, "\t", join ', ', @unique;
}

回复收藏 0 原文

祁梦 2024-07-22 07:04:09

下面是一个使用 CPAN 的 Text::CSV 模块而不是手动解析 CSV 字段的示例：

use strict;
use warnings;
use Text::CSV;

my %hash;
my $csv = Text::CSV->new({ allow_whitespace => 1 });

open my $file, "<", "file/to/read.txt";

while(<$file>) {
  my ($first, $rest) = split /\t/, $_, 2;
  my @values;

  if($csv->parse($rest)) {
    @values = $csv->fields()
  } else {
    warn "Error: invalid CSV: $rest";
    next;
  }

  foreach(@values) {
    push @{ $hash{$_} }, $first;
  }
}

# this can be shortened, but I don't remember whether sort()
# defaults to <=> or cmp, so I was explicit
foreach(sort { $a cmp $b } keys %hash) {
  print "$_\t", join(",", sort { $a <=> $b } @{ $hash{$_} }), "\n";
}

请注意，它将打印到标准输出。我建议仅重定向标准输出，如果您完全扩展此程序，请确保使用 warn() 打印任何错误，而不是仅使用 print() 打印错误。另外，它不会检查重复条目，但我不想让我的代码看起来像 Brad Gilbert 的代码，即使对于珍珠岩来说，这看起来也有点奇怪。

Here is an example using CPAN's Text::CSV module rather than manual parsing of CSV fields:

use strict;
use warnings;
use Text::CSV;

my %hash;
my $csv = Text::CSV->new({ allow_whitespace => 1 });

open my $file, "<", "file/to/read.txt";

while(<$file>) {
  my ($first, $rest) = split /\t/, $_, 2;
  my @values;

  if($csv->parse($rest)) {
    @values = $csv->fields()
  } else {
    warn "Error: invalid CSV: $rest";
    next;
  }

  foreach(@values) {
    push @{ $hash{$_} }, $first;
  }
}

# this can be shortened, but I don't remember whether sort()
# defaults to <=> or cmp, so I was explicit
foreach(sort { $a cmp $b } keys %hash) {
  print "$_\t", join(",", sort { $a <=> $b } @{ $hash{$_} }), "\n";
}

Note that it will print to standard output. I recommend just redirecting standard output, and if you expand this program at all, make sure to use warn() to print any errors, rather than just print()ing them. Also, it won't check for duplicate entries, but I don't want to make my code look like Brad Gilbert's, which looks a bit wack even to a Perlite.

回复收藏 0 原文

攒眉千度 2024-07-22 07:04:09

这是 awk(1) 和 sort(1) 的答案：

您的数据基本上是一个多对多数据集，因此第一步是使用每行一个键和值来规范化数据。我们还将交换键和值以指示新的主字段，但这并不是绝对必要的，因为下面的部分不依赖于顺序。我们使用制表符或 [空格]、[空格] 作为字段分隔符，因此我们在键和值之间以及值之间在制表符上进行拆分。这将在值中留下嵌入的空格，但从前后修剪它们：

awk -F '\t| *, *' '{ for (i=2; i<=NF; ++i) { print $i"\t"$1 } }'

然后我们要应用您的排序顺序并消除重复项。我们使用 bash 功能来指定制表符作为分隔符 (-t $'\t')。如果您使用的是 Bourne/POSIX shell，则需要使用“[tab]”，其中 [tab] 是文字制表符：

sort -t 
然后，将其放回您想要的形式：将
awk -F '\t' '{ 
    if (key != $1) { 
        if (key) printf "\n";
        key=$1;
        printf "%s\t%s", $1, $2
    } else {
        printf ", %s", $2
    }
  }
  END {printf "\n"}'

它们一起通过管道传输，您应该获得所需的输出。   我用 GNU 工具进行了测试。
\t' -u -k 1f,1 -k 2n

然后，将其放回您想要的形式：将

它们一起通过管道传输，您应该获得所需的输出。我用 GNU 工具进行了测试。

Here's an awk(1) and sort(1) answer:

Your data is basically a many-to-many data set so the first step is to normalise the data with one key and value per line. We'll also swap the keys and values to indicate the new primary field, but this isn't strictly necessary as the parts lower down do not depend on order. We use a tab or [spaces],[spaces] as the field separator so we split on the tab between the key and values, and between the values. This will leave spaces embedded in the values, but trim them from before and after:

awk -F '\t| *, *' '{ for (i=2; i<=NF; ++i) { print $i"\t"$1 } }'

Then we want to apply your sort order and eliminate duplicates. We use a bash feature to specify a tab char as the separator (-t $'\t'). If you are using Bourne/POSIX shell, you will need to use '[tab]', where [tab] is a literal tab:

sort -t 
Then, put it back in the form you want:
awk -F '\t' '{ 
    if (key != $1) { 
        if (key) printf "\n";
        key=$1;
        printf "%s\t%s", $1, $2
    } else {
        printf ", %s", $2
    }
  }
  END {printf "\n"}'

Pipe them altogether and you should get your desired output. I tested with the GNU tools.
\t' -u -k 1f,1 -k 2n

Then, put it back in the form you want:

Pipe them altogether and you should get your desired output. I tested with the GNU tools.

回复收藏 0 原文

~没有更多了~