将多个 csv 文件组合在一起,并在串联期间添加一列

发布于 2024-10-18 14:38:59 字数 1436 浏览 0 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

梦途 2024-10-25 14:38:59

使用 Text::CSV

程序

#!/usr/bin/env perl

use strict;
use warnings;

use File::Find;
use Text::CSV;

my $semi_colon_csv = Text::CSV->new( { 'sep_char' => ';', } );
my $comma_csv = Text::CSV->new( {
    'sep_char' => ',',
    'eol'      => "\n",
} );

open my $fh_output, '>', 'output.csv' or die $!;

sub convert {
    my $file_name = shift;

    open my $fh_input, '<', $file_name or die $!;

    # header
    my $row = $semi_colon_csv->getline($fh_input);
    $comma_csv->print( $fh_output, [ @$row, $file_name ] );

    while ( $row = $semi_colon_csv->getline($fh_input) ) {
        pop @$row unless $row->[-1];  # remove trailing semi-colon from input
        my ($token) = ( $file_name =~ /^([^_]+)/ );
        $comma_csv->print( $fh_output, [ @$row, $token ] );
    }
}

sub wanted {
    return unless -f;
    convert($_);
}

my $path = 'csv';  # assuming that all your CSVs are in ./csv/
find( \&wanted, $path );

输出(输出.csv)

Header1,Header2,Header3,Header4,Header5,Token1_Token2_Token3.csv
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1

Using Text::CSV:

Program

#!/usr/bin/env perl

use strict;
use warnings;

use File::Find;
use Text::CSV;

my $semi_colon_csv = Text::CSV->new( { 'sep_char' => ';', } );
my $comma_csv = Text::CSV->new( {
    'sep_char' => ',',
    'eol'      => "\n",
} );

open my $fh_output, '>', 'output.csv' or die $!;

sub convert {
    my $file_name = shift;

    open my $fh_input, '<', $file_name or die $!;

    # header
    my $row = $semi_colon_csv->getline($fh_input);
    $comma_csv->print( $fh_output, [ @$row, $file_name ] );

    while ( $row = $semi_colon_csv->getline($fh_input) ) {
        pop @$row unless $row->[-1];  # remove trailing semi-colon from input
        my ($token) = ( $file_name =~ /^([^_]+)/ );
        $comma_csv->print( $fh_output, [ @$row, $token ] );
    }
}

sub wanted {
    return unless -f;
    convert($_);
}

my $path = 'csv';  # assuming that all your CSVs are in ./csv/
find( \&wanted, $path );

Output (output.csv)

Header1,Header2,Header3,Header4,Header5,Token1_Token2_Token3.csv
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
静水深流 2024-10-25 14:38:59

不管你相信与否,它可能很简单:

awk 'BEGIN{OFS = FS = ";"} {print $0, FILENAME}' *.csv > newfile.csv

如果您想将字段分隔符从分号更改为逗号:

awk 'BEGIN{OFS = ","; FS = ";"} {$1 = $1; print $0, FILENAME}' *.csv > newfile.csv

仅包含第一个标记:

awk 'BEGIN{OFS = ","; FS = ";"} {$1 = $1; split(FILENAME, a, "_"); print $0, a[1]}' *.csv > newfile.csv

Believe it or not, it may be as simple as:

awk 'BEGIN{OFS = FS = ";"} {print $0, FILENAME}' *.csv > newfile.csv

If you want to change the field separator from semicolons to commas:

awk 'BEGIN{OFS = ","; FS = ";"} {$1 = $1; print $0, FILENAME}' *.csv > newfile.csv

To include only the first token:

awk 'BEGIN{OFS = ","; FS = ";"} {$1 = $1; split(FILENAME, a, "_"); print $0, a[1]}' *.csv > newfile.csv
谁的年少不轻狂 2024-10-25 14:38:59

您可能想尝试一下这个快速&肮脏的 Perl hack 来转换数据:

#!/usr/bin/perl
use strict;
use warnings;

# Open input file
my $inputfile = shift or die("Usage: $0 <filename>\n\n");
open F, $inputfile or die("Could not open input file ($!)\n\n");

# Split filename into an array
my @tokens = split("_", $inputfile);

my $isFirstline = 1;

# Iterate each line in the file
foreach my $line (<F>) {
    my $addition;

    chomp($line);    # Remove newline

    # Add the complete filename to the line at first line
    if ($isFirstline) {
        $isFirstline = 0;
        $addition    = ",$inputfile";
    } else {         # Add first token for the rest of the lines
        $addition = ",$tokens[0]";
    }

    # Split the data into @elements array
    my @elements = split(";", $line);

    # Join it using comma and add filename/token & a new line
    print join(",", @elements) . $addition . "\n";
}

close(F);

You might want to try this quick & dirty Perl hack to convert the data:

#!/usr/bin/perl
use strict;
use warnings;

# Open input file
my $inputfile = shift or die("Usage: $0 <filename>\n\n");
open F, $inputfile or die("Could not open input file ($!)\n\n");

# Split filename into an array
my @tokens = split("_", $inputfile);

my $isFirstline = 1;

# Iterate each line in the file
foreach my $line (<F>) {
    my $addition;

    chomp($line);    # Remove newline

    # Add the complete filename to the line at first line
    if ($isFirstline) {
        $isFirstline = 0;
        $addition    = ",$inputfile";
    } else {         # Add first token for the rest of the lines
        $addition = ",$tokens[0]";
    }

    # Split the data into @elements array
    my @elements = split(";", $line);

    # Join it using comma and add filename/token & a new line
    print join(",", @elements) . $addition . "\n";
}

close(F);
爱给你人给你 2024-10-25 14:38:59

Perl 的 DBI 模块可以处理 CSV 文件(需要 DBD::CSV 模块)和 MySQL。只需将所有 csv 文件放在同一个目录中,然后像这样查询它们:

use DBI;
my $DBH = DBI->connect ("dbi:CSV:", "", "", { f_dir => "$DATABASEDIR", f_ext => ".csv", csv_sep_char => ";",});
my $sth = $dbh->prepare ("SELECT * FROM Token1_Token2_Token3");
$sth->execute;
while (my $hr = $sth->fetchrow_hashref) {

 [...]
}
$sth->finish ();

您可以查询 csv 文件(包括 JOIN 语句!)并将数据直接插入 MySQL。

Perl's DBI module can cope with CSV files (DBD::CSV module required) and MySQL. Just put all your csv files in the same dir, and query them like this:

use DBI;
my $DBH = DBI->connect ("dbi:CSV:", "", "", { f_dir => "$DATABASEDIR", f_ext => ".csv", csv_sep_char => ";",});
my $sth = $dbh->prepare ("SELECT * FROM Token1_Token2_Token3");
$sth->execute;
while (my $hr = $sth->fetchrow_hashref) {

 [...]
}
$sth->finish ();

Yo can query csv files (including JOIN statements!) and insert data directly into MySQL.

ヅ她的身影、若隐若现 2024-10-25 14:38:59

这是在 PowerShell 中执行此操作的一种方法:

$res = 'result.csv'
'Header1,Header2,Header3,Header4,Header5,FileName' > $res

foreach ($file in dir *.csv)
{
  if ($file -notmatch '(\w+)_\w+_\w+\.csv') { continue }

  $csv = Import-Csv $file -Delimiter ';'
  $csv | Foreach {"{0},{1},{2},{3},{4},{5}" -f `
    $_.Header1,$_.Header2,$_.Header3,$_.Header4,$_.Header5,$matches[1]} >> $res
}

如果文件的大小不是那么大,我建议采用此路线:

$csvAll = @()
foreach ($file in dir *.csv)
{
  if ($file -notmatch '(\w+)_\w+_\w+\.csv') { continue }

  $csv = Import-Csv $file -Delimiter ';'
  $csv | Add-Member NoteProperty FileName $matches[1]
  $csvAll += $csv
}

$csvAll | Export-Csv result.csv -NoTypeInformation

但是,这会将所有 CSV 文件的完整内容保存在内存中,直到准备好在结尾。除非您有具有大量内存的 64 位 Windows,否则不可行。 :-)

This is one way to do it in PowerShell:

$res = 'result.csv'
'Header1,Header2,Header3,Header4,Header5,FileName' > $res

foreach ($file in dir *.csv)
{
  if ($file -notmatch '(\w+)_\w+_\w+\.csv') { continue }

  $csv = Import-Csv $file -Delimiter ';'
  $csv | Foreach {"{0},{1},{2},{3},{4},{5}" -f `
    $_.Header1,$_.Header2,$_.Header3,$_.Header4,$_.Header5,$matches[1]} >> $res
}

If the size of the files weren't so potentially large I would suggest going this route:

$csvAll = @()
foreach ($file in dir *.csv)
{
  if ($file -notmatch '(\w+)_\w+_\w+\.csv') { continue }

  $csv = Import-Csv $file -Delimiter ';'
  $csv | Add-Member NoteProperty FileName $matches[1]
  $csvAll += $csv
}

$csvAll | Export-Csv result.csv -NoTypeInformation

However, this holds the complete contents of all CSV files in memory until it is ready to export at the end. Not feasible unless you have 64-bit Windows with lots of memory. :-)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文