如何删除新行字符,直到每行都有特定数量的特定字符实例?

发布于 2024-10-13 11:33:27 字数 700 浏览 1 评论 0原文

我有一个非常混乱的管道分隔文件,我需要将其加载到数据库中。该文件有 35 个字段,因此有 34 个管道。其中一个字段由 HTML 代码组成,对于某些记录,该代码包含多个换行符。不幸的是,没有关于断线在哪里的说法。

我提出的解决方案是计算每行中管道的数量,直到该数字达到 34,从该行中删除新行字符。我对 Perl 并不是非常精通,但我认为我已经接近实现我想要做的事情了。有什么建议吗?

#!/usr/local/bin/perl

use strict;

open (FILE, 'test.txt');

while (<FILE>) {
    chomp;
    my $line = $_;
    #remove null characters that are included in file
    $line =~ tr/\x00//;
    #count number of pipes
    my $count = ($line =~ tr/|//);
    #each line should have 34 pipes
    while ($count < 34) {
        #remove new lines until line has 34 pipes
        $line =~ tr/\r\n//;
        $count = ($line =~ tr/|//);
        print "$line\n";
    }
}

I have a real mess of a pipe-delimited file, which I need to load to a database. The file has 35 fields, and thus 34 pipes. One of the fields is comprised of HTML code which, for some records, includes multiple line breaks. Unfortunately there's no patter as to where the line breaks fall.

The solution I've come up with is to count the number of pipes in each line and until that number reaches 34, remove the new line character from that line. I'm not incredibly well-versed in Perl, but I think I'm close to achieving what I'm looking to do. Any suggestions?

#!/usr/local/bin/perl

use strict;

open (FILE, 'test.txt');

while (<FILE>) {
    chomp;
    my $line = $_;
    #remove null characters that are included in file
    $line =~ tr/\x00//;
    #count number of pipes
    my $count = ($line =~ tr/|//);
    #each line should have 34 pipes
    while ($count < 34) {
        #remove new lines until line has 34 pipes
        $line =~ tr/\r\n//;
        $count = ($line =~ tr/|//);
        print "$line\n";
    }
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

溺ぐ爱和你が 2024-10-20 11:33:27

我想这应该有效。

#!/usr/bin/perl

use strict;

open (FILE, 'test.txt');

my $num_pipes = 0, my $line_num = 0;
my $tmp = "";
while (<FILE>)
{
    $line_num++;
    chomp;
    my $line = $_;
    $line =~ tr/\x00//; #remove null characters that are included in file
    $num_pipes += ($line =~ tr/|//); #count number of pipes
    if ($num_pipes == 34 && length($tmp))
    {
            $tmp .= $line;
            print "$tmp\n";
            # Reset values.
            $tmp = "";
            $num_pipes = 0;
    }
    elsif ($num_pipes == 34 && length($tmp) == 0)
    {
            print "$line\n";
            $num_pipes = 0;
    }
    elsif ($num_pipes < 34)
    {
            $tmp .= $line;
    }
    elsif ($num_pipes > 34)
    {
            print STDERR "Error before line $line_num. Too many pipes ($num_pipes)\n";
            $num_pipes = 0;
            $tmp = "";
    }
}

This should work I guess.

#!/usr/bin/perl

use strict;

open (FILE, 'test.txt');

my $num_pipes = 0, my $line_num = 0;
my $tmp = "";
while (<FILE>)
{
    $line_num++;
    chomp;
    my $line = $_;
    $line =~ tr/\x00//; #remove null characters that are included in file
    $num_pipes += ($line =~ tr/|//); #count number of pipes
    if ($num_pipes == 34 && length($tmp))
    {
            $tmp .= $line;
            print "$tmp\n";
            # Reset values.
            $tmp = "";
            $num_pipes = 0;
    }
    elsif ($num_pipes == 34 && length($tmp) == 0)
    {
            print "$line\n";
            $num_pipes = 0;
    }
    elsif ($num_pipes < 34)
    {
            $tmp .= $line;
    }
    elsif ($num_pipes > 34)
    {
            print STDERR "Error before line $line_num. Too many pipes ($num_pipes)\n";
            $num_pipes = 0;
            $tmp = "";
    }
}
所谓喜欢 2024-10-20 11:33:27

摆弄 $/输入记录分隔符

while (!eof(FILE)) {

    # assemble a row of data: 35 pipe separated fields, possibly over many lines
    my @fields = ();
    {
        # read 34 fields from FILE:
        local $/ = '|';
        for (1..34) {
            push @fields, scalar <FILE>;
        }
    }   # $/ is set back to original value ("\n") at the end of this block

    push @fields, scalar <FILE>;  # read last field, which ends with newline
    my $line = join '|', @fields;
    ... now you can process $line, and you already have the @fields ......
}

Twiddle with $/, the input record separator?

while (!eof(FILE)) {

    # assemble a row of data: 35 pipe separated fields, possibly over many lines
    my @fields = ();
    {
        # read 34 fields from FILE:
        local $/ = '|';
        for (1..34) {
            push @fields, scalar <FILE>;
        }
    }   # $/ is set back to original value ("\n") at the end of this block

    push @fields, scalar <FILE>;  # read last field, which ends with newline
    my $line = join '|', @fields;
    ... now you can process $line, and you already have the @fields ......
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文