从 csv 文件中的数据创建多个 csv 文件

发布于 2024-08-28 17:02:46 字数 1569 浏览 5 评论 0原文

系统 OSX 或 Linux

我正在尝试自动化我的工作流程，每周我都会收到一个 excel 文件，我会将其转换为 csv。

一个例子是：

,,L1,,,L2,,,L3,,,L4,,,L5,,,L6,,,L7,,,L8,,,L9,,,L10,,,L11,
Title,r/t,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,neede d,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst
EXAMPLEfoo,60,6,6,6,0,0,0,0,0,0,6,6,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
EXAMPLEbar,30,6,6,12,6,7,14,6,6,12,6,6,12,6,8,16,6,7,14,6,7.5,15,6,6,12,6,8,16,6,0,0,6,7,14
EXAMPLE1,60,3,3,3,3,5,5,3,4,4,3,3,3,3,6,6,3,4,4,3,3,3,3,4,4,3,8,8,3,0,0,3,4,4
EXAMPLE2,120,6,6,3,0,0,0,6,8,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
EXAMPLE3,60,6,6,6,6,8,8,6,6,6,6,6,6,0,0,0,0,0,0,6,8,8,6,6,6,0,0,0,0,0,0,0,10,10
EXAMPLE4,30,6,6,12,6,7,14,6,6,12,6,6,12,3,5.5,11,6,7.5,15,6,6,12,6,0,0,6,9,18,6,0,0,6,6.5,13

所以你可以得到它在 excel 中的样子的图片：

我需要做的是为第 1 行中的每个实例创建多个 csv 文件，因此 L1、L2、L3、L4...

在每个 csv 文件中，它需要包含标题、r/t，

所以对于 L1示例输出如下：

EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6

对于 L2：

EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6

等等。

我尝试过使用 sed 和 awk 并点击 google，但没有发现任何东西可以真正解决问题。

我想 Perl 可能特别适合这个，或者可能是 python，所以我非常乐意接受用户的建议。

那么，有什么建议吗？

提前致谢。

原文

System OSX or Linux

I'm trying to automate my work flow at work, each week I receive an excel file, which I convert to a csv.

An example is:

,,L1,,,L2,,,L3,,,L4,,,L5,,,L6,,,L7,,,L8,,,L9,,,L10,,,L11,
Title,r/t,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,neede d,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst
EXAMPLEfoo,60,6,6,6,0,0,0,0,0,0,6,6,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
EXAMPLEbar,30,6,6,12,6,7,14,6,6,12,6,6,12,6,8,16,6,7,14,6,7.5,15,6,6,12,6,8,16,6,0,0,6,7,14
EXAMPLE1,60,3,3,3,3,5,5,3,4,4,3,3,3,3,6,6,3,4,4,3,3,3,3,4,4,3,8,8,3,0,0,3,4,4
EXAMPLE2,120,6,6,3,0,0,0,6,8,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
EXAMPLE3,60,6,6,6,6,8,8,6,6,6,6,6,6,0,0,0,0,0,0,6,8,8,6,6,6,0,0,0,0,0,0,0,10,10
EXAMPLE4,30,6,6,12,6,7,14,6,6,12,6,6,12,3,5.5,11,6,7.5,15,6,6,12,6,0,0,6,9,18,6,0,0,6,6.5,13

And so you can get a picture of how it looks in excel:

What I need to do, is create multiple csv files for each instance in row 1, so L1, L2, L3, L4...

And within that each csv file it needs to contain the title, r/t, needed

So for L1 an example out put would look like:

EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6

And for L2:

EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6

And so on.

I have tried playing around with sed and awk and hit google but I have found nothing that really solves the issue.

I'd imagine perl would be particular suited to this or maybe python, so I would be more than happy to accept suggestions from users.

So, any suggestions?

Thanks in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

故笙诉离歌 2024-09-04 17:02:46

Perl“one-liner”

perl -MText::CSV_XS -e'$c=Text::CSV_XS->new({binary=>1,eol=>"\n"});%a=map{$i++;/^L\d+$/?($_=>$i):()}@{$c->getline(*ARGV)};open$b{$_},">$_"for keys%a;while($f=$c->getline(*ARGV)){$c->print($b{$_},[@$f[0,1,$a{$_}]])for keys%a}'

对于阅读有问题的人：

$ echo '$c=Te...' | perltidy
$c = Text::CSV_XS->new( { binary => 1, eol => "\n" } );
%a = map { $i++; /^L\d+$/ ? ( $_ => $i ) : () } @{ $c->getline(*ARGV) };
open $b{$_}, ">$_" for keys %a;
while ( $f = $c->getline(*ARGV) ) {
    $c->print( $b{$_}, [ @$f[ 0, 1, $a{$_} ] ] )
      for keys %a;
}

Perl "one-liner"

perl -MText::CSV_XS -e'$c=Text::CSV_XS->new({binary=>1,eol=>"\n"});%a=map{$i++;/^L\d+$/?($_=>$i):()}@{$c->getline(*ARGV)};open$b{$_},">$_"for keys%a;while($f=$c->getline(*ARGV)){$c->print($b{$_},[@$f[0,1,$a{$_}]])for keys%a}'

For ones which have problem with reading:

$ echo '$c=Te...' | perltidy
$c = Text::CSV_XS->new( { binary => 1, eol => "\n" } );
%a = map { $i++; /^L\d+$/ ? ( $_ => $i ) : () } @{ $c->getline(*ARGV) };
open $b{$_}, ">$_" for keys %a;
while ( $f = $c->getline(*ARGV) ) {
    $c->print( $b{$_}, [ @$f[ 0, 1, $a{$_} ] ] )
      for keys %a;
}

回复收藏 0 原文

初见你 2024-09-04 17:02:46

仅使用 AWK：

awk -F, -vOFS=, -vc=1 '
    NR == 1 {
        for (i=1; i<NF; i++) {
            if ($i != "") {
                g[c]=i;
                f[c++]=$i
            }
        }
    }
    NR>2 {
        for (i=1; i < c; i++) {
            print $1,$2, $g[i] > "output_"f[i]".csv"
        }
    }' data.csv

作为一行：

awk -F, -vOFS=, -vc=1 'NR == 1 {for (i=1; i<NF; i++) {if ($i != "") {g[c]=i; f[c++]=$i}}} NR>2 { for (i=1; i < c; i++) {print $1,$2, $g[i] > "file_"f[i]".csv" }}' data.csv

示例输出：

$ cat file_L1.csv
EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6
$ cat file_L2.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6
$ cat file_L11.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,0
EXAMPLE4,30,6

Using only AWK:

awk -F, -vOFS=, -vc=1 '
    NR == 1 {
        for (i=1; i<NF; i++) {
            if ($i != "") {
                g[c]=i;
                f[c++]=$i
            }
        }
    }
    NR>2 {
        for (i=1; i < c; i++) {
            print $1,$2, $g[i] > "output_"f[i]".csv"
        }
    }' data.csv

As a one-liner:

awk -F, -vOFS=, -vc=1 'NR == 1 {for (i=1; i<NF; i++) {if ($i != "") {g[c]=i; f[c++]=$i}}} NR>2 { for (i=1; i < c; i++) {print $1,$2, $g[i] > "file_"f[i]".csv" }}' data.csv

Example output:

$ cat file_L1.csv
EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6
$ cat file_L2.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6
$ cat file_L11.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,0
EXAMPLE4,30,6

回复收藏 0 原文

浅蓝的眸勾画不出的柔情 2024-09-04 17:02:46

use strict;
use warnings;

use Text::CSV;
my $csv = Text::CSV->new;

sub parse_line {
    $csv->parse(shift) or die $!;
    return $csv->fields;
}

my @metadata;
my @files  = parse_line(scalar <>);
my @header = parse_line(scalar <>); # Ignore.
for my $i (0 .. $#files){
    next unless length $files[$i];
    open(my $h, '>', "$files[$i].csv") or die $!;
    push @metadata, {column => $i, handle => $h};
}

while (my $line = <>){
    my @fields = parse_line($line);
    for my $m (@metadata){
        $csv->print($m->{handle}, [ @fields[0, 1, $m->{column}] ]);
        print {$m->{handle}} "\n";
    }
}

use strict;
use warnings;

use Text::CSV;
my $csv = Text::CSV->new;

sub parse_line {
    $csv->parse(shift) or die $!;
    return $csv->fields;
}

my @metadata;
my @files  = parse_line(scalar <>);
my @header = parse_line(scalar <>); # Ignore.
for my $i (0 .. $#files){
    next unless length $files[$i];
    open(my $h, '>', "$files[$i].csv") or die $!;
    push @metadata, {column => $i, handle => $h};
}

while (my $line = <>){
    my @fields = parse_line($line);
    for my $m (@metadata){
        $csv->print($m->{handle}, [ @fields[0, 1, $m->{column}] ]);
        print {$m->{handle}} "\n";
    }
}

回复收藏 0 原文

肩上的翅膀 2024-09-04 17:02:46

看看 perl 模块 Text::CSV_XS -逗号分隔值操作例程。我发现这个模块在处理 CSV 文件时非常有用。

回复收藏 0 原文

倾城泪 2024-09-04 17:02:46

尝试这个

#!/bin/bash
awk 'BEGIN{ OFS=FS="," }
NR==1{
 for(i=1;i<=NF;i++){
   if($i){ f[i]=$i }
 }
}
NR>2{ for(o in f){ print $1,$2, $o > "file_"f[o]".csv" } } ' file

输出

$ cat file_L1.csv
EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6

$ cat file_L2.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6

try this

#!/bin/bash
awk 'BEGIN{ OFS=FS="," }
NR==1{
 for(i=1;i<=NF;i++){
   if($i){ f[i]=$i }
 }
}
NR>2{ for(o in f){ print $1,$2, $o > "file_"f[o]".csv" } } ' file

output

$ cat file_L1.csv
EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6

$ cat file_L2.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6

回复收藏 0 原文

把昨日还给我 2024-09-04 17:02:46

在 Python 中，稍微有点 hacky 和未经测试，但应该可以完成工作：

import csv
r = csv.reader(open(r'file.csv'), dialect='excel')
topline = r.next()
headerline = r.next()

lastcell = ''
for i, cell in enumerate(topline): #Copy cells forwards in the top line, so L1 for example goes across all cells
    if cell == '':
        topline[i] = lastcell
    else:
        lastcell = cell

for i in range(len(headerline)): #Copy the topline cells into the header line, so the headerline cells should be unique
    headerline[i] = '-'.join((topline[i], headerline[i]))

rows = [dict(zip(headerline, line)) for line in r]

# Rows should now consist of dicts of the form {'Title': 'EXAMPLEfoo', 'r/t': '60', 'L1-needed': '6' ...}

for lval in frozenset(topline): #Use frozenset to ensure we only have unique values.
    if lval != '': #Make sure we don't look at the blank value
        w = csv.writer(open(r'%s.csv' % lval, 'w'), dialect='excel')
        for row in rows:
            line = [row['Title'], row['r/t'], row['-'.join((lval, 'needed'))]]
            w.writerow(line)

In Python, slightly hacky and untested, but should do the job:

import csv
r = csv.reader(open(r'file.csv'), dialect='excel')
topline = r.next()
headerline = r.next()

lastcell = ''
for i, cell in enumerate(topline): #Copy cells forwards in the top line, so L1 for example goes across all cells
    if cell == '':
        topline[i] = lastcell
    else:
        lastcell = cell

for i in range(len(headerline)): #Copy the topline cells into the header line, so the headerline cells should be unique
    headerline[i] = '-'.join((topline[i], headerline[i]))

rows = [dict(zip(headerline, line)) for line in r]

# Rows should now consist of dicts of the form {'Title': 'EXAMPLEfoo', 'r/t': '60', 'L1-needed': '6' ...}

for lval in frozenset(topline): #Use frozenset to ensure we only have unique values.
    if lval != '': #Make sure we don't look at the blank value
        w = csv.writer(open(r'%s.csv' % lval, 'w'), dialect='excel')
        for row in rows:
            line = [row['Title'], row['r/t'], row['-'.join((lval, 'needed'))]]
            w.writerow(line)

回复收藏 0 原文

~没有更多了~