Perl 脚本用另一行更新文件的一行

发布于 2024-10-16 04:10:53 字数 662 浏览 3 评论 0原文

我有文本格式的数据文件，其中有几行。现在，某些行包含错误的数据，我需要使用包含正确数据的行进行更新。例如，

Col1  Col2  Col3  Col4 .......
A1?%     A     foo  fooo .......
B€(2     B     .................  
C&6     Z     .................
A?04     Y     .................
B++3     Q     .................
C!5     C     .................
D*9     D     .................

实际数据不同，但这是它的简化版本。正如您所看到的，有某些 Col1，其中 A1 是 A，但 A4 是 Y，依此类推。其余列 Col3、Col4 ... 取决于 Col2。因此，当 Col1 中存在 A（A1、A2、A3 等）时，我需要检查 Col2 是否为 A。如果不是，我必须根据 A 所在的行更新 Col2、Col3 ....。

这如何在 Perl 中完成。我知道这种操作可以在数据库中使用更新语句来完成，但我在这里没有那么奢侈，必须以编程方式完成。

编辑：文件以制表符分隔，数据是可以包含任何字母数字或 ascii 字符的字符串。

原文

I have data files in text format which have several rows. Now there are certain rows that have wrong data which I need to update with those that have the correct data. For example,

Col1  Col2  Col3  Col4 .......
A1?%     A     foo  fooo .......
B€(2     B     .................  
C&6     Z     .................
A?04     Y     .................
B++3     Q     .................
C!5     C     .................
D*9     D     .................

The actual data is different but this is a simplified version of it. As you can see there are certain Col1 where A1 is A but A4 is Y and so on. The rest of the columns Col3, Col4 ... depend on Col2. So, I need to check if Col2 is A when there is an A in Col1 (A1, A2, A3 etc). If not I have to update Col2, Col3 .... based on the row where it is A.

How may this be accomplished in Perl. I know this kind of operations can be done in an database with an update statement but I don't have that luxury here and have to do it programatically.

Edit: The files are tab delimited and the data are strings that can contain any alphanumeric or ascii character.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

温柔女人霸气范 2024-10-23 04:10:54

填充一个哈希图，其中键是 Col2（A、B、C 等），值是其余列（Col3、Col4 等）。仅当 Col1 和 Col2 根据需要匹配时，才将 Col2 作为键。

然后，在写出文件时，如果 Col1 和 Col2 不匹配，则在哈希中查找 Col1 的第一个字符。这将为您提供要插入的 Col3、Col4... 值。

回复收藏 0 原文

风启觞 2024-10-23 04:10:54

使用 CSV 处理器！

至少 Text::CSV 或类似 Text::CSV_XS （更快）或 Text::CSV::Encoded （例如对于 UTF- 8).

DBD::CSV 提供 SQL。

回复收藏 0 原文

北风几吹夏 2024-10-23 04:10:54

下面是允许您执行此操作的基本程序结构的框架。如果我知道你想做什么，我会提供更多帮助。

我做了最简单的猜测，并将您的输入文件视为宽度=7,6,* 的固定列。正如您后来告诉我的那样，它们是用制表符分隔的，我已经更改了将数据分解为字段的代码。

use autodie;
use strict;
use warnings;
use English qw<$INPUT_LINE_NUMBER>;

my %data;
my $line_no;
open ( my $h, '<', 'good_file.dat' );

while ( <$h> ) {
    my ( $col1, $col2, $data ) = split( /\t+/, $_, 3 );
    # next unless index( $col1, 'A' ) == 0;
    $line_no = $INPUT_LINE_NUMBER;
    my $rec 
        = { col1 => $col1
          , col2 => $col2
          , data => $data
          , line => $line_no
          };
    push( @{ $data{"$col1-$col2"} }, $rec );
    $data{ $line_no } = $rec;
}
close $h;

open ( $h, '<', 'old_file.dat' );

while ( <$h> ) { 
    my ( $col1, $col2, $data ) = split( /\t+/, $_, 3 );
    ... 
}

以下只是将值打印回文件的一种方法。

open ( $h, '>', 'old_file.dat' );
foreach my $rec ( grep {; defined } @data{ 1..$line_no } ) { 
    printf $h "%s\t%s\t%s\n", @$rec{qw<col1 col2 data>};
}

但你确实没有给任何人足够的帮助来帮助你。

Below is a skeleton of a basic program structure to allow you to do this. If I knew what you wanted to do I could be a lot more helpful.

I had made the easiest guess possible, and I treated your input files as if they were fixed-column with widths=7,6,*. As you have since informed me that they are tab-delimited, I have changed the code that breaks up the data into fields.

use autodie;
use strict;
use warnings;
use English qw<$INPUT_LINE_NUMBER>;

my %data;
my $line_no;
open ( my $h, '<', 'good_file.dat' );

while ( <$h> ) {
    my ( $col1, $col2, $data ) = split( /\t+/, $_, 3 );
    # next unless index( $col1, 'A' ) == 0;
    $line_no = $INPUT_LINE_NUMBER;
    my $rec 
        = { col1 => $col1
          , col2 => $col2
          , data => $data
          , line => $line_no
          };
    push( @{ $data{"$col1-$col2"} }, $rec );
    $data{ $line_no } = $rec;
}
close $h;

open ( $h, '<', 'old_file.dat' );

while ( <$h> ) { 
    my ( $col1, $col2, $data ) = split( /\t+/, $_, 3 );
    ... 
}

The following is just a way you could print your values back into the file.

open ( $h, '>', 'old_file.dat' );
foreach my $rec ( grep {; defined } @data{ 1..$line_no } ) { 
    printf $h "%s\t%s\t%s\n", @$rec{qw<col1 col2 data>};
}

But you really haven't given anyone enough help to help you.

回复收藏 0 原文