当前位置：文江博客话题详情

将多个 csv 文件组合在一起，并在串联期间添加一列

发布于 2024-10-18 14:38:59 字数 1436 浏览 0 评论 0原文

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦途 2024-10-25 14:38:59

使用 Text::CSV：

程序

#!/usr/bin/env perl

use strict;
use warnings;

use File::Find;
use Text::CSV;

my $semi_colon_csv = Text::CSV->new( { 'sep_char' => ';', } );
my $comma_csv = Text::CSV->new( {
    'sep_char' => ',',
    'eol'      => "\n",
} );

open my $fh_output, '>', 'output.csv' or die $!;

sub convert {
    my $file_name = shift;

    open my $fh_input, '<', $file_name or die $!;

    # header
    my $row = $semi_colon_csv->getline($fh_input);
    $comma_csv->print( $fh_output, [ @$row, $file_name ] );

    while ( $row = $semi_colon_csv->getline($fh_input) ) {
        pop @$row unless $row->[-1];  # remove trailing semi-colon from input
        my ($token) = ( $file_name =~ /^([^_]+)/ );
        $comma_csv->print( $fh_output, [ @$row, $token ] );
    }
}

sub wanted {
    return unless -f;
    convert($_);
}

my $path = 'csv';  # assuming that all your CSVs are in ./csv/
find( \&wanted, $path );

输出（输出.csv)

Header1,Header2,Header3,Header4,Header5,Token1_Token2_Token3.csv
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1

Using Text::CSV:

Program

#!/usr/bin/env perl

use strict;
use warnings;

use File::Find;
use Text::CSV;

my $semi_colon_csv = Text::CSV->new( { 'sep_char' => ';', } );
my $comma_csv = Text::CSV->new( {
    'sep_char' => ',',
    'eol'      => "\n",
} );

open my $fh_output, '>', 'output.csv' or die $!;

sub convert {
    my $file_name = shift;

    open my $fh_input, '<', $file_name or die $!;

    # header
    my $row = $semi_colon_csv->getline($fh_input);
    $comma_csv->print( $fh_output, [ @$row, $file_name ] );

    while ( $row = $semi_colon_csv->getline($fh_input) ) {
        pop @$row unless $row->[-1];  # remove trailing semi-colon from input
        my ($token) = ( $file_name =~ /^([^_]+)/ );
        $comma_csv->print( $fh_output, [ @$row, $token ] );
    }
}

sub wanted {
    return unless -f;
    convert($_);
}

my $path = 'csv';  # assuming that all your CSVs are in ./csv/
find( \&wanted, $path );

Output (output.csv)

Header1,Header2,Header3,Header4,Header5,Token1_Token2_Token3.csv
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1
Data1,Data2,Data3,Data4,Data5,Token1

回复收藏 0 原文

静水深流 2024-10-25 14:38:59

不管你相信与否，它可能很简单：

awk 'BEGIN{OFS = FS = ";"} {print $0, FILENAME}' *.csv > newfile.csv

如果您想将字段分隔符从分号更改为逗号：

awk 'BEGIN{OFS = ","; FS = ";"} {$1 = $1; print $0, FILENAME}' *.csv > newfile.csv

仅包含第一个标记：

awk 'BEGIN{OFS = ","; FS = ";"} {$1 = $1; split(FILENAME, a, "_"); print $0, a[1]}' *.csv > newfile.csv

Believe it or not, it may be as simple as:

awk 'BEGIN{OFS = FS = ";"} {print $0, FILENAME}' *.csv > newfile.csv

If you want to change the field separator from semicolons to commas:

awk 'BEGIN{OFS = ","; FS = ";"} {$1 = $1; print $0, FILENAME}' *.csv > newfile.csv

To include only the first token:

awk 'BEGIN{OFS = ","; FS = ";"} {$1 = $1; split(FILENAME, a, "_"); print $0, a[1]}' *.csv > newfile.csv

回复收藏 0 原文

谁的年少不轻狂 2024-10-25 14:38:59

您可能想尝试一下这个快速&肮脏的 Perl hack 来转换数据：

#!/usr/bin/perl
use strict;
use warnings;

# Open input file
my $inputfile = shift or die("Usage: $0 <filename>\n\n");
open F, $inputfile or die("Could not open input file ($!)\n\n");

# Split filename into an array
my @tokens = split("_", $inputfile);

my $isFirstline = 1;

# Iterate each line in the file
foreach my $line (<F>) {
    my $addition;

    chomp($line);    # Remove newline

    # Add the complete filename to the line at first line
    if ($isFirstline) {
        $isFirstline = 0;
        $addition    = ",$inputfile";
    } else {         # Add first token for the rest of the lines
        $addition = ",$tokens[0]";
    }

    # Split the data into @elements array
    my @elements = split(";", $line);

    # Join it using comma and add filename/token & a new line
    print join(",", @elements) . $addition . "\n";
}

close(F);

You might want to try this quick & dirty Perl hack to convert the data:

#!/usr/bin/perl
use strict;
use warnings;

# Open input file
my $inputfile = shift or die("Usage: $0 <filename>\n\n");
open F, $inputfile or die("Could not open input file ($!)\n\n");

# Split filename into an array
my @tokens = split("_", $inputfile);

my $isFirstline = 1;

# Iterate each line in the file
foreach my $line (<F>) {
    my $addition;

    chomp($line);    # Remove newline

    # Add the complete filename to the line at first line
    if ($isFirstline) {
        $isFirstline = 0;
        $addition    = ",$inputfile";
    } else {         # Add first token for the rest of the lines
        $addition = ",$tokens[0]";
    }

    # Split the data into @elements array
    my @elements = split(";", $line);

    # Join it using comma and add filename/token & a new line
    print join(",", @elements) . $addition . "\n";
}

close(F);

回复收藏 0 原文

爱给你人给你 2024-10-25 14:38:59

Perl 的 DBI 模块可以处理 CSV 文件（需要 DBD::CSV 模块）和 MySQL。只需将所有 csv 文件放在同一个目录中，然后像这样查询它们：

use DBI;
my $DBH = DBI->connect ("dbi:CSV:", "", "", { f_dir => "$DATABASEDIR", f_ext => ".csv", csv_sep_char => ";",});
my $sth = $dbh->prepare ("SELECT * FROM Token1_Token2_Token3");
$sth->execute;
while (my $hr = $sth->fetchrow_hashref) {

 [...]
}
$sth->finish ();

您可以查询 csv 文件（包括 JOIN 语句！）并将数据直接插入 MySQL。

Perl's DBI module can cope with CSV files (DBD::CSV module required) and MySQL. Just put all your csv files in the same dir, and query them like this:

use DBI;
my $DBH = DBI->connect ("dbi:CSV:", "", "", { f_dir => "$DATABASEDIR", f_ext => ".csv", csv_sep_char => ";",});
my $sth = $dbh->prepare ("SELECT * FROM Token1_Token2_Token3");
$sth->execute;
while (my $hr = $sth->fetchrow_hashref) {

 [...]
}
$sth->finish ();

Yo can query csv files (including JOIN statements!) and insert data directly into MySQL.

回复收藏 0 原文

ヅ她的身影、若隐若现 2024-10-25 14:38:59

这是在 PowerShell 中执行此操作的一种方法：

$res = 'result.csv'
'Header1,Header2,Header3,Header4,Header5,FileName' > $res

foreach ($file in dir *.csv)
{
  if ($file -notmatch '(\w+)_\w+_\w+\.csv') { continue }

  $csv = Import-Csv $file -Delimiter ';'
  $csv | Foreach {"{0},{1},{2},{3},{4},{5}" -f `
    $_.Header1,$_.Header2,$_.Header3,$_.Header4,$_.Header5,$matches[1]} >> $res
}

如果文件的大小不是那么大，我建议采用此路线：

$csvAll = @()
foreach ($file in dir *.csv)
{
  if ($file -notmatch '(\w+)_\w+_\w+\.csv') { continue }

  $csv = Import-Csv $file -Delimiter ';'
  $csv | Add-Member NoteProperty FileName $matches[1]
  $csvAll += $csv
}

$csvAll | Export-Csv result.csv -NoTypeInformation

但是，这会将所有 CSV 文件的完整内容保存在内存中，直到准备好在结尾。除非您有具有大量内存的 64 位 Windows，否则不可行。 :-)

This is one way to do it in PowerShell:

$res = 'result.csv'
'Header1,Header2,Header3,Header4,Header5,FileName' > $res

foreach ($file in dir *.csv)
{
  if ($file -notmatch '(\w+)_\w+_\w+\.csv') { continue }

  $csv = Import-Csv $file -Delimiter ';'
  $csv | Foreach {"{0},{1},{2},{3},{4},{5}" -f `
    $_.Header1,$_.Header2,$_.Header3,$_.Header4,$_.Header5,$matches[1]} >> $res
}

If the size of the files weren't so potentially large I would suggest going this route:

$csvAll = @()
foreach ($file in dir *.csv)
{
  if ($file -notmatch '(\w+)_\w+_\w+\.csv') { continue }

  $csv = Import-Csv $file -Delimiter ';'
  $csv | Add-Member NoteProperty FileName $matches[1]
  $csvAll += $csv
}

$csvAll | Export-Csv result.csv -NoTypeInformation

However, this holds the complete contents of all CSV files in memory until it is ready to export at the end. Not feasible unless you have 64-bit Windows with lots of memory. :-)

回复收藏 0 原文

~没有更多了~

关于作者

紫竹語嫣☆

暂无简介

0 文章

0 评论

21 人气

关注发私信

留蓝

文章 0 评论 0

关注

18790681156

文章 0 评论 0

关注

zach7772

文章 0 评论 0

关注

Wini

文章 0 评论 0

关注

ayeshaaroy

文章 0 评论 0

关注

初雪

文章 0 评论 0

友情链接

文江博客

将多个 csv 文件组合在一起，并在串联期间添加一列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

程序

输出（输出.csv)

Program

Output (output.csv)

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

将多个 csv 文件组合在一起，并在串联期间添加一列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

程序

输出（输出.csv)

Program

Output (output.csv)

关于作者

相关话题

热门标签

推荐作者

留蓝

18790681156

zach7772

Wini

ayeshaaroy

初雪

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。