需要一个 shell 脚本将逗号分隔符更改为管道分隔符

发布于 2024-10-05 15:39:11 字数 210 浏览 0 评论 0原文

我的输入看起来像 "$130.00","$2,200.00","$1,230.63" 等等 我的问题是如何将逗号分隔符更改为 |分隔符而不删除实际输入中的逗号。 只是为了澄清此输入位于具有 40 列和 9500 行的 csv 文件中。 我希望我的输出看起来像

"$130.00"|"$2,200.00"|"$1,230.63"

My input looks like "$130.00","$2,200.00","$1,230.63" and so on
My question is how can I go about changing the comma delimiter to a | delimiter without getting rid of the comma in the actual input.
Just to clarify this input is in a csv file with 40 columns and 9500 rows.
I want my output to look like

"$130.00"|"$2,200.00"|"$1,230.63"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

故事和酒 2024-10-12 15:39:11

为了可靠地做到这一点,您必须使用状态来跟踪您是否在字符串内。以下 perl 脚本应该可以工作:

#!/usr/bin/perl -w
use strict;
use warnings;

my $state_outside_string = 0;
my $state_inside_string  = 1;

my $state = $state_outside_string;

while (my $line = <>) {
    my @chars = split(//,$line);
    foreach my $char (@chars) {
        if ($char eq '"') {
            if ($state == $state_outside_string) {
                $state = $state_inside_string;
            } else {
                $state = $state_outside_string;
            }
        } elsif ($char eq ',') {
            if ($state == $state_outside_string) {
                print '|';
                next;
            }
        }
        print $char;
    }
}

To do this reliably, you have to use states to keep track of wether you are inside a string or not. The following perl script should work:

#!/usr/bin/perl -w
use strict;
use warnings;

my $state_outside_string = 0;
my $state_inside_string  = 1;

my $state = $state_outside_string;

while (my $line = <>) {
    my @chars = split(//,$line);
    foreach my $char (@chars) {
        if ($char eq '"') {
            if ($state == $state_outside_string) {
                $state = $state_inside_string;
            } else {
                $state = $state_outside_string;
            }
        } elsif ($char eq ',') {
            if ($state == $state_outside_string) {
                print '|';
                next;
            }
        }
        print $char;
    }
}
嘿咻 2024-10-12 15:39:11

“让 shell 运行 Perl 脚本”算不算?

如果是这样,我会看看 Perl 的 Text::CSV 模块。您将有两个 CSV 句柄,一个用于读取 sep_char 属性设置为逗号(标准、默认)的文件,另一个用于使用 sep_char 写入文件属性设置为管道。

工作脚本 示例

#!/usr/bin/env perl

use strict;
use warnings;
use Text::CSV;

die "Usage: $0 in_file out_file\n" unless scalar @ARGV == 2;
my $in  = Text::CSV->new({ binary => 1, blank_is_undef => 1 })
    or die "Horribly";
my $out = Text::CSV->new({ binary => 1, sep_char => '|',
                           always_quote => 1, eol => "\n" })
    or die "Horribly";
open my $fh_in,  '<', $ARGV[0]
    or die "Failed to open $ARGV[0] for reading ($!)";
open my $fh_out, '>', $ARGV[1]
    or die "Failed to open $ARGV[1] for writing ($!)";

while (my $fields  = $in->getline($fh_in))
{
    $out->print($fh_out, $fields);
}

close $fh_in  or die "Failed to close input ($!)";
close $fh_out or die "Failed to close output ($!)";

输入

"$130.00","$2,200.00","$1,230.63"
"EUR1.300,00",,
"GBP1,300.00","$2,200.00",

示例输出

"$130.00"|"$2,200.00"|"$1,230.63"
"EUR1.300,00"||
"GBP1,300.00"|"$2,200.00"|

Does 'having shell run a Perl script' count?

If so, I'd look at Perl's Text::CSV module. You'd have two CSV handles, one for reading the file with the sep_char attribute set as comma (the standard, default), the other for writing the file with the sep_char attribute set as pipe.

Working script

#!/usr/bin/env perl

use strict;
use warnings;
use Text::CSV;

die "Usage: $0 in_file out_file\n" unless scalar @ARGV == 2;
my $in  = Text::CSV->new({ binary => 1, blank_is_undef => 1 })
    or die "Horribly";
my $out = Text::CSV->new({ binary => 1, sep_char => '|',
                           always_quote => 1, eol => "\n" })
    or die "Horribly";
open my $fh_in,  '<', $ARGV[0]
    or die "Failed to open $ARGV[0] for reading ($!)";
open my $fh_out, '>', $ARGV[1]
    or die "Failed to open $ARGV[1] for writing ($!)";

while (my $fields  = $in->getline($fh_in))
{
    $out->print($fh_out, $fields);
}

close $fh_in  or die "Failed to close input ($!)";
close $fh_out or die "Failed to close output ($!)";

Sample input

"$130.00","$2,200.00","$1,230.63"
"EUR1.300,00",,
"GBP1,300.00","$2,200.00",

Sample output

"$130.00"|"$2,200.00"|"$1,230.63"
"EUR1.300,00"||
"GBP1,300.00"|"$2,200.00"|
时光礼记 2024-10-12 15:39:11

如果文件中没有其他逗号,则可以使用:

sed "s/,/|/g" filename > outputfilename

如果逗号仅在 "" 之间,则:

sed 's/","/"|"/g' filename > outputfilename

工作原理如下:

sh-3.1$ echo '"123,456","123,454"' |sed 's/","/"|"/g'
"123,456"|"123,454"

如果您仍然可以有像 "," 在你的输入中并且不想改变它,那么它会变得有点复杂,我认为:)


使用专用模块的Python的另一种解决方案,可能在安全性和所需代码方面是最好的:

import csv
inFilename = 'input.csv'
outFilename = 'output.csv'

r = csv.reader(open(inFilename))
w = csv.writer(open(outFilename,'w'), delimiter='|', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
w.writerows(list(r))

安全且简单。您可以轻松地将其调整为其他格式,参数相当简单。

If you have no other commas in your file, you can use:

sed "s/,/|/g" filename > outputfilename

If the commas are only between the ""s, then:

sed 's/","/"|"/g' filename > outputfilename

Works like this:

sh-3.1$ echo '"123,456","123,454"' |sed 's/","/"|"/g'
"123,456"|"123,454"

If you can still have an quoted-expression like "," in your input and don't want to change that, then it gets a bit more complicated, I think :)


Another solution with Python using a dedicated module, probably best in terms of safety and code needed:

import csv
inFilename = 'input.csv'
outFilename = 'output.csv'

r = csv.reader(open(inFilename))
w = csv.writer(open(outFilename,'w'), delimiter='|', quotechar='"', quoting=csv.QUOTE_NONNUMERIC)
w.writerows(list(r))

Safe and simple. You can tweak this for other formats easily, the parameters are fairly straightforward.

你与清晨阳光 2024-10-12 15:39:11

Ruby 的 CSV 库在 1.9 中被替换为 FasterCSV;在早期版本中,您可以使用 fastercsv gem。

#!/usr/bin/env ruby

require "csv"

output = CSV.read("test.csv").map do |row|
  row.to_csv(:col_sep => "|")
end
puts output

Ruby's CSV library was replaced with FasterCSV in 1.9; in earlier versions you can use the fastercsv gem.

#!/usr/bin/env ruby

require "csv"

output = CSV.read("test.csv").map do |row|
  row.to_csv(:col_sep => "|")
end
puts output
南…巷孤猫 2024-10-12 15:39:11

我遇到了同样的问题,我没有任何完美的解决方案,所以我尝试如下:

for file in `find $1 -name "*.csv"`
        do
            extension=`echo $file | awk -F . '{print $NF}'`
            fileName=`basename $file .csv`
            cat $file | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "~", $i) } 1' > "$fileName.txt"
            cat "$fileName.txt" | sed 's/,/|/g' > pipedelimited.txt
            cat pipedelimited.txt | sed 's/~/,/g' > "$fileName.txt"
            rm -rf pipedelimited.txt
            echo "File Convert is complted for $file"
done

这将为传递给 shell 脚本的目录下的所有文件创建管道分隔的文件。这还处理列中具有额外逗号的转义字符。

I had same issue I did not any perfect solution so I tried as below:

for file in `find $1 -name "*.csv"`
        do
            extension=`echo $file | awk -F . '{print $NF}'`
            fileName=`basename $file .csv`
            cat $file | awk -F'"' -v OFS='' '{ for (i=2; i<=NF; i+=2) gsub(",", "~", $i) } 1' > "$fileName.txt"
            cat "$fileName.txt" | sed 's/,/|/g' > pipedelimited.txt
            cat pipedelimited.txt | sed 's/~/,/g' > "$fileName.txt"
            rm -rf pipedelimited.txt
            echo "File Convert is complted for $file"
done

This will create pipe-delimited files for all files under the directory passed to the shell script. This also handles the escape character which is having extra comma in a column.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文