Perl 脚本用另一行更新文件的一行
我有文本格式的数据文件,其中有几行。现在,某些行包含错误的数据,我需要使用包含正确数据的行进行更新。例如,
Col1 Col2 Col3 Col4 .......
A1?% A foo fooo .......
B€(2 B .................
C&6 Z .................
A?04 Y .................
B++3 Q .................
C!5 C .................
D*9 D .................
实际数据不同,但这是它的简化版本。正如您所看到的,有某些 Col1,其中 A1 是 A,但 A4 是 Y,依此类推。其余列 Col3、Col4 ... 取决于 Col2。因此,当 Col1 中存在 A(A1、A2、A3 等)时,我需要检查 Col2 是否为 A。如果不是,我必须根据 A 所在的行更新 Col2、Col3 ....。
这如何在 Perl 中完成。我知道这种操作可以在数据库中使用更新语句来完成,但我在这里没有那么奢侈,必须以编程方式完成。
编辑:文件以制表符分隔,数据是可以包含任何字母数字或 ascii 字符的字符串。
I have data files in text format which have several rows. Now there are certain rows that have wrong data which I need to update with those that have the correct data. For example,
Col1 Col2 Col3 Col4 .......
A1?% A foo fooo .......
B€(2 B .................
C&6 Z .................
A?04 Y .................
B++3 Q .................
C!5 C .................
D*9 D .................
The actual data is different but this is a simplified version of it. As you can see there are certain Col1 where A1 is A but A4 is Y and so on. The rest of the columns Col3, Col4 ... depend on Col2. So, I need to check if Col2 is A when there is an A in Col1 (A1, A2, A3 etc). If not I have to update Col2, Col3 .... based on the row where it is A.
How may this be accomplished in Perl. I know this kind of operations can be done in an database with an update statement but I don't have that luxury here and have to do it programatically.
Edit: The files are tab delimited and the data are strings that can contain any alphanumeric or ascii character.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
填充一个哈希图,其中键是 Col2(A、B、C 等),值是其余列(Col3、Col4 等)。仅当 Col1 和 Col2 根据需要匹配时,才将 Col2 作为键。
然后,在写出文件时,如果 Col1 和 Col2 不匹配,则在哈希中查找 Col1 的第一个字符。这将为您提供要插入的 Col3、Col4... 值。
Populate a hashmap where the key is Col2 (A,B,C, etc) and the value is the rest of the columns (Col3, Col4, etc). Only make Col2 the key if Col1 and Col2 match as you want.
Then when writing out the file if Col1 and Col2 do not match, do a lookup in the hash on the first character of Col1. This will get you the Col3, Col4... values to insert.
使用 CSV 处理器!
至少
Text::CSV
或类似Text::CSV_XS
(更快)或Text::CSV::Encoded
(例如对于 UTF- 8).DBD::CSV
提供 SQL。Use a CSV processor!
At least
Text::CSV
or relatives likeText::CSV_XS
(faster) orText::CSV::Encoded
(e.g. for UTF-8).DBD::CSV
provides SQL.下面是允许您执行此操作的基本程序结构的框架。如果我知道你想做什么,我会提供更多帮助。
我做了最简单的猜测,并将您的输入文件视为宽度=7,6,* 的固定列。正如您后来告诉我的那样,它们是用制表符分隔的,我已经更改了将数据分解为字段的代码。
以下只是将值打印回文件的一种方法。
但你确实没有给任何人足够的帮助来帮助你。
Below is a skeleton of a basic program structure to allow you to do this. If I knew what you wanted to do I could be a lot more helpful.
I had made the easiest guess possible, and I treated your input files as if they were fixed-column with widths=7,6,*. As you have since informed me that they are tab-delimited, I have changed the code that breaks up the data into fields.
The following is just a way you could print your values back into the file.
But you really haven't given anyone enough help to help you.
我执行此操作的方法是打开一个输入文件句柄和一个输出文件句柄,然后逐行检查文件检查列一,如果没问题,则将其按原样放入我的输出中。
如果确实需要更改,我会创建一个新行并进行必要的更改,并将其也放入我的输出文件中。
这是一个简单的方法,虽然不是最伟大的/优雅的/无论如何,都会很快给你你需要的东西。
The way I would do this is to open an input file handle and an output file handle, and go line by line through the file checking column one and, if its fine, just plop it into my output just as it is.
If it does need to change, I would make a new line with the necessary changes and put it into my output file as well.
This is a simple approach, that while not the greatest/elegant/whatever, would give you what you need quickly.