Perl 转换逻辑 - 文件处理或 DB

发布于 2024-10-17 21:55:57 字数 515 浏览 7 评论 0原文

我正在为一个文件构建转换逻辑，该文件对文件中的字段应用某些转换规则。此类转换规则的示例包括：

如果某些字段为空，则为其设置默认值（如果第 5 列为空，则将其设置为“空”）
根据某些列汇总文件（如果文件有 col1、col2 和 col3，则汇总文件将 col3 聚合为所有 col1）
替换某些字段中的字符串（将 col1 中的所有“ax”替换为“ay”）
等等。

从性能角度来看，在大文件上执行这些转换时，最好使用纯文件处理（逐行读取文件，使用哈希进行汇总，正则表达式进行其他转换等）或将数据加载到数据库表中，汇总并应用所有转换逻辑并将其下载回文件？

总结例如：

原始文件有：

A|B|C|100|200|300

A|B|C|200|100|0

A|X|C|100|100|100

转换后的文件有：

A|B|300 |300|300

A|X|100|100|100

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

安人多梦 2024-10-24 21:55:57

假设您给出的数据，这个问题完全在 Perl 的掌握范围内，无需数据库：

my %data;
while (my $line = <DATA>) {
    chomp $line;
    my ($c1, $c2, undef, @cols) = split /\|/, $line, -1;

    $data{"$c1|$c2"}[$_] += $cols[$_] for 0 .. $#cols;
}

print join('|' => $_, @{ $data{$_} }), "\n" for sort keys %data;

__DATA__
A|B|C|100|200|300
A|B|C|200|100|0
A|X|C|100|100|100

打印：

A|B|300|300|300
A|X|100|100|100

您当然需要在剩余的转换中进行编码，但这应该给您一个开始。即使事实证明您需要多次访问原始行，假设您的数据不是很大，您也可以将其加载到二维数组中，然后对其进行遍历。或者，您可以使用 Tie::File 访问非常大的文件，而无需将其全部读入。

Assuming the data you have given, this problem is well within Perl's grasp without a database:

my %data;
while (my $line = <DATA>) {
    chomp $line;
    my ($c1, $c2, undef, @cols) = split /\|/, $line, -1;

    $data{"$c1|$c2"}[$_] += $cols[$_] for 0 .. $#cols;
}

print join('|' => $_, @{ $data{$_} }), "\n" for sort keys %data;

__DATA__
A|B|C|100|200|300
A|B|C|200|100|0
A|X|C|100|100|100

which prints:

A|B|300|300|300
A|X|100|100|100

You will of course need to code in the remaining transforms, but this should give you a start. Even if it turns out you need to access the raw rows more than once, assuming your data is not gigantic, you could load it into a two dimensional array, and then run your passes over it. Or you could use Tie::File to access a very large file without reading it all in.

回复收藏 0 原文