Perl 脚本用另一行更新文件的一行

发布于 2024-10-16 04:10:53 字数 662 浏览 3 评论 0原文

我有文本格式的数据文件,其中有几行。现在,某些行包含错误的数据,我需要使用包含正确数据的行进行更新。例如,

Col1  Col2  Col3  Col4 .......
A1?%     A     foo  fooo .......
B€(2     B     .................  
C&6     Z     .................
A?04     Y     .................
B++3     Q     .................
C!5     C     .................
D*9     D     .................

实际数据不同,但这是它的简化版本。正如您所看到的,有某些 Col1,其中 A1 是 A,但 A4 是 Y,依此类推。其余列 Col3、Col4 ... 取决于 Col2。因此,当 Col1 中存在 A(A1、A2、A3 等)时,我需要检查 Col2 是否为 A。如果不是,我必须根据 A 所在的行更新 Col2、Col3 ....。

这如何在 Perl 中完成。我知道这种操作可以在数据库中使用更新语句来完成,但我在这里没有那么奢侈,必须以编程方式完成。

编辑:文件以制表符分隔,数据是可以包含任何字母数字或 ascii 字符的字符串。

I have data files in text format which have several rows. Now there are certain rows that have wrong data which I need to update with those that have the correct data. For example,

Col1  Col2  Col3  Col4 .......
A1?%     A     foo  fooo .......
B€(2     B     .................  
C&6     Z     .................
A?04     Y     .................
B++3     Q     .................
C!5     C     .................
D*9     D     .................

The actual data is different but this is a simplified version of it. As you can see there are certain Col1 where A1 is A but A4 is Y and so on. The rest of the columns Col3, Col4 ... depend on Col2. So, I need to check if Col2 is A when there is an A in Col1 (A1, A2, A3 etc). If not I have to update Col2, Col3 .... based on the row where it is A.

How may this be accomplished in Perl. I know this kind of operations can be done in an database with an update statement but I don't have that luxury here and have to do it programatically.

Edit: The files are tab delimited and the data are strings that can contain any alphanumeric or ascii character.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

温柔女人霸气范 2024-10-23 04:10:54

填充一个哈希图,其中键是 Col2(A、B、C 等),值是其余列(Col3、Col4 等)。仅当 Col1 和 Col2 根据需要匹配时,才将 Col2 作为键。

然后,在写出文件时,如果 Col1 和 Col2 不匹配,则在哈希中查找 Col1 的第一个字符。这将为您提供要插入的 Col3、Col4... 值。

Populate a hashmap where the key is Col2 (A,B,C, etc) and the value is the rest of the columns (Col3, Col4, etc). Only make Col2 the key if Col1 and Col2 match as you want.

Then when writing out the file if Col1 and Col2 do not match, do a lookup in the hash on the first character of Col1. This will get you the Col3, Col4... values to insert.

风启觞 2024-10-23 04:10:54

使用 CSV 处理器!

至少 Text::CSV 或类似 Text::CSV_XS (更快)或 Text::CSV::Encoded (例如对于 UTF- 8).

DBD::CSV 提供 SQL。

Use a CSV processor!

At least Text::CSV or relatives like Text::CSV_XS (faster) or Text::CSV::Encoded (e.g. for UTF-8).

DBD::CSV provides SQL.

北风几吹夏 2024-10-23 04:10:54

下面是允许您执行此操作的基本程序结构的框架。如果我知道你想做什么,我会提供更多帮助。

我做了最简单的猜测,并将您的输入文件视为宽度=7,6,* 的固定列。正如您后来告诉我的那样,它们是用制表符分隔的,我已经更改了将数据分解为字段的代码。

use autodie;
use strict;
use warnings;
use English qw<$INPUT_LINE_NUMBER>;

my %data;
my $line_no;
open ( my $h, '<', 'good_file.dat' );

while ( <$h> ) {
    my ( $col1, $col2, $data ) = split( /\t+/, $_, 3 );
    # next unless index( $col1, 'A' ) == 0;
    $line_no = $INPUT_LINE_NUMBER;
    my $rec 
        = { col1 => $col1
          , col2 => $col2
          , data => $data
          , line => $line_no
          };
    push( @{ $data{"$col1-$col2"} }, $rec );
    $data{ $line_no } = $rec;
}
close $h;

open ( $h, '<', 'old_file.dat' );

while ( <$h> ) { 
    my ( $col1, $col2, $data ) = split( /\t+/, $_, 3 );
    ... 
}

以下只是将值打印回文件的一种方法。

open ( $h, '>', 'old_file.dat' );
foreach my $rec ( grep {; defined } @data{ 1..$line_no } ) { 
    printf $h "%s\t%s\t%s\n", @$rec{qw<col1 col2 data>};
}

但你确实没有给任何人足够的帮助来帮助你。

Below is a skeleton of a basic program structure to allow you to do this. If I knew what you wanted to do I could be a lot more helpful.

I had made the easiest guess possible, and I treated your input files as if they were fixed-column with widths=7,6,*. As you have since informed me that they are tab-delimited, I have changed the code that breaks up the data into fields.

use autodie;
use strict;
use warnings;
use English qw<$INPUT_LINE_NUMBER>;

my %data;
my $line_no;
open ( my $h, '<', 'good_file.dat' );

while ( <$h> ) {
    my ( $col1, $col2, $data ) = split( /\t+/, $_, 3 );
    # next unless index( $col1, 'A' ) == 0;
    $line_no = $INPUT_LINE_NUMBER;
    my $rec 
        = { col1 => $col1
          , col2 => $col2
          , data => $data
          , line => $line_no
          };
    push( @{ $data{"$col1-$col2"} }, $rec );
    $data{ $line_no } = $rec;
}
close $h;

open ( $h, '<', 'old_file.dat' );

while ( <$h> ) { 
    my ( $col1, $col2, $data ) = split( /\t+/, $_, 3 );
    ... 
}

The following is just a way you could print your values back into the file.

open ( $h, '>', 'old_file.dat' );
foreach my $rec ( grep {; defined } @data{ 1..$line_no } ) { 
    printf $h "%s\t%s\t%s\n", @$rec{qw<col1 col2 data>};
}

But you really haven't given anyone enough help to help you.

无尽的现实 2024-10-23 04:10:53

我执行此操作的方法是打开一个输入文件句柄和一个输出文件句柄,然后逐行检查文件检查列一,如果没问题,则将其按原样放入我的输出中。

如果确实需要更改,我会创建一个新行并进行必要的更改,并将其也放入我的输出文件中。

这是一个简单的方法,虽然不是最伟大的/优雅的/无论如何,都会很快给你你需要的东西。

The way I would do this is to open an input file handle and an output file handle, and go line by line through the file checking column one and, if its fine, just plop it into my output just as it is.

If it does need to change, I would make a new line with the necessary changes and put it into my output file as well.

This is a simple approach, that while not the greatest/elegant/whatever, would give you what you need quickly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文