如何使用AWK合并两个文件?
文件 1 有 5 个字段 ABCDE,其中字段 A 为整数值
文件 2 有 3 个字段 AFG
文件 1 的行数比文件 2 的行数大得多(20^6 到 5000)
文件 A 的所有条目1 出现在文件 2 的 A 字段中
我喜欢按字段 A 合并两个文件并携带 F 和 G
所需的输出为 ABCDEFG
示例
文件 1
A B C D E
4050 S00001 31228 3286 0
4050 S00012 31227 4251 0
4049 S00001 28342 3021 1
4048 S00001 46578 4210 0
4048 S00113 31221 4250 0
4047 S00122 31225 4249 0
4046 S00344 31322 4000 1
文件 2
A F G
4050 12.1 23.6
4049 14.4 47.8
4048 23.2 43.9
4047 45.5 21.6
所需的输出
A B C D E F G
4050 S00001 31228 3286 0 12.1 23.6
4050 S00012 31227 4251 0 12.1 23.6
4049 S00001 28342 3021 1 14.4 47.8
4048 S00001 46578 4210 0 23.2 43.9
4048 S00113 31221 4250 0 23.2 43.9
4047 S00122 31225 4249 0 45.5 21.6
File 1 has 5 fields A B C D E, with field A is an integer-valued
File 2 has 3 fields A F G
The number of rows in File 1 is much bigger than that of File 2 (20^6 to 5000)
All the entries of A in File 1 appeared in field A in File 2
I like to merge the two files by field A and carry F and G
Desired output is A B C D E F G
Example
File 1
A B C D E
4050 S00001 31228 3286 0
4050 S00012 31227 4251 0
4049 S00001 28342 3021 1
4048 S00001 46578 4210 0
4048 S00113 31221 4250 0
4047 S00122 31225 4249 0
4046 S00344 31322 4000 1
File 2
A F G
4050 12.1 23.6
4049 14.4 47.8
4048 23.2 43.9
4047 45.5 21.6
Desired output
A B C D E F G
4050 S00001 31228 3286 0 12.1 23.6
4050 S00012 31227 4251 0 12.1 23.6
4049 S00001 28342 3021 1 14.4 47.8
4048 S00001 46578 4210 0 23.2 43.9
4048 S00113 31221 4250 0 23.2 43.9
4047 S00122 31225 4249 0 45.5 21.6
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
说明:(部分基于另一个问题。不过有点晚了。)
FNR
指的是当前文件中的记录数(通常是行号),NR
指总记录数。运算符 == 是比较运算符,当周围的两个操作数相等时返回 true。所以FNR==NR{commands}
意味着括号内的命令仅在处理第一个文件(现在是file2
)时执行。FS
指的是字段分隔符,$1
、$2
等是一行中的第一个、第二个等字段。a[$1]=$2 FS $3
表示字典(/数组)(名为a
)由$1
键和填充$2 FS $3
价值。;
分隔命令next
表示当前行忽略任何其他命令。 (处理在下一行继续。)$0
是整行{print $0, a[$1]}
只是打印出整行和的值>a[$1]
(如果$1
在字典中,否则仅打印$0
)。现在,由于FNR==NR{...;next}
,它仅对第二个文件(现在是file1
)执行。Explanation: (Partly based on another question. A bit late though.)
FNR
refers to the record number (typically the line number) in the current file andNR
refers to the total record number. The operator == is a comparison operator, which returns true when the two surrounding operands are equal. SoFNR==NR{commands}
means that the commands inside the brackets only executed while processing the first file (file2
now).FS
refers to the field separator and$1
,$2
etc. are the 1st, 2nd etc. fields in a line.a[$1]=$2 FS $3
means that a dictionary(/array) (nameda
) is filled with$1
key and$2 FS $3
value.;
separates the commandsnext
means that any other commands are ignored for the current line. (The processing continues on the next line.)$0
is the whole line{print $0, a[$1]}
simply prints out the whole line and the value ofa[$1]
(if$1
is in the dictionary, otherwise only$0
is printed). Now it is only executed for the 2nd file (file1
now), because ofFNR==NR{...;next}
.值得庆幸的是,您根本不需要写这个。 Unix 有一个 join 命令可以为您完成此操作。
这是“在行动”:
Thankfully, you don't need to write this at all. Unix has a join command to do this for you.
Here it is "in action":
您需要将文件 2 中的条目读入 BEGIN 块中的一对关联数组中。假设 GNU Awk:
在主处理块中,您从文件 1 中读取该行,并使用 BEGIN 块中创建的数组中的正确数据打印它:
将文件 1 作为文件名参数提供给程序。
由于文件名中存在空格,因此文件名参数周围需要引号。即使
getline
文件名不包含空格,您也需要用引号引起来,否则它就是一个变量名。You need to read the entries from File 2 into a pair of associative arrays in the BEGIN block. Assuming GNU Awk:
In the main processing block, you read the line from File 1 and print it with the correct data from the arrays created in the BEGIN block:
Supply File 1 as the filename argument to the program.
The quotes around the file name argument are needed because of the spaces in the file name. You need the quotes around the
getline
filename even if it contained no spaces as it would otherwise be a variable name.