如何使用 unix shell 脚本比较基于某些列的两个文件
我有两个文件,即 file1 和 file2,其中都有一些相似和不同的数据。我想根据第 1 列和第 2 列提取具有不同数据的数据。示例文件就像这样。
文件 1:
COL1 COL2 COL3
fruits apple 50
fruits mango 60
fruits kiwi 35
vegetable tomato 20
vegetable brinjal 30
文件 2:
COL1 COL2 COL3
fruits apple 50
fruits orange 25
vegetable tomato 20
vegetable potato 25
sauce chilly 78
输出应如下所示:
COL1 COL2 COL3
fruits mango 60
fruits kiwi 35
vegetable brinjal 30
提前致谢!
i have two files say file1 and file2 which has some similar and different data's in both.i want to extract the data that has different data based on column 1 and column2. the sample files would be like.
File 1:
COL1 COL2 COL3
fruits apple 50
fruits mango 60
fruits kiwi 35
vegetable tomato 20
vegetable brinjal 30
File 2:
COL1 COL2 COL3
fruits apple 50
fruits orange 25
vegetable tomato 20
vegetable potato 25
sauce chilly 78
output should be like :
COL1 COL2 COL3
fruits mango 60
fruits kiwi 35
vegetable brinjal 30
Thanks in advance!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您想要集合算术术语中所谓的差异。我将使用 GNU
AWK
来完成此任务,如下所示,让file1.txt
内容为file2.txt
内容,然后
输出
解释:我使用数组
arr
,当浏览第一个文件(FNR==NR
)时,我只需将arr
的键值设置为第一个和第二个的串联列到整行($0
),除了第一个文件 (FNR!=NR
) 之外的所有文件中,我确实删除了与之前从arr
中以相同方式制作的密钥使用delete
,请注意,使用delete
与不存在的键没有任何问题。处理所有文件后,我使用for
循环从arr
打印值。这可能与两个以上的文件一起使用,在这种情况下,效果与在输入第二个和下一个文件之前将它们连接在一起相同。 免责声明此解决方案假设您不关心输出中的行顺序(在 gawk 4.2.1 中测试)
You want what is called difference in set arithmetics parlance. I would use GNU
AWK
for this task as follows, letfile1.txt
content beand
file2.txt
content bethen
output
Explanation: I use array
arr
, when going through first file (FNR==NR
) I simply setarr
's value for key being concatenation of 1st and 2nd column to whole line ($0
), in all but first file(s) (FNR!=NR
) I do remove key made same way as earlier fromarr
usingdelete
, note that there is nothing wrong with usingdelete
with key which does not exist. When all files are processed I doprint
values fromarr
usingfor
loop. This might be used with more than 2 files, in which case effect would be same as concating 2nd and next files together before feeding them. DISCLAIMER this solution assumes that you do not care about rows order in output(tested in gawk 4.2.1)