如何使用 unix shell 脚本比较基于某些列的两个文件

发布于 2025-01-09 05:33:35 字数 597 浏览 0 评论 0原文

我有两个文件，即 file1 和 file2，其中都有一些相似和不同的数据。我想根据第 1 列和第 2 列提取具有不同数据的数据。示例文件就像这样。

文件 1：

COL1        COL2   COL3
fruits      apple    50
fruits      mango    60
fruits      kiwi     35
vegetable   tomato   20
vegetable   brinjal  30

文件 2：

COL1        COL2   COL3
fruits      apple    50
fruits      orange   25
vegetable   tomato   20
vegetable   potato   25
sauce       chilly   78

输出应如下所示：

COL1        COL2     COL3
fruits      mango    60
fruits      kiwi     35
vegetable   brinjal  30

提前致谢！

原文

i have two files say file1 and file2 which has some similar and different data's in both.i want to extract the data that has different data based on column 1 and column2. the sample files would be like.

File 1:

COL1        COL2   COL3
fruits      apple    50
fruits      mango    60
fruits      kiwi     35
vegetable   tomato   20
vegetable   brinjal  30

File 2:

COL1        COL2   COL3
fruits      apple    50
fruits      orange   25
vegetable   tomato   20
vegetable   potato   25
sauce       chilly   78

output should be like :

COL1        COL2     COL3
fruits      mango    60
fruits      kiwi     35
vegetable   brinjal  30

Thanks in advance!!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

遥远的绿洲 2025-01-16 05:33:35

您想要集合算术术语中所谓的差异。我将使用 GNU AWK 来完成此任务，如下所示，让 file1.txt 内容为

fruits      apple    50
fruits      mango    60
fruits      kiwi     35
vegetable   tomato   20
vegetable   brinjal  30

file2.txt 内容，

fruits      apple    50
fruits      orange   25
vegetable   tomato   20
vegetable   potato   25
sauce       chilly   78

然后

awk '(FNR==NR){arr[$1,$2]=$0}(FNR!=NR){delete arr[$1,$2]}END{for(i in arr){print arr[i]}}' file1.txt file2.txt

输出

fruits      kiwi     35
fruits      mango    60
vegetable   brinjal  30

解释：我使用数组arr，当浏览第一个文件（FNR==NR）时，我只需将arr的键值设置为第一个和第二个的串联列到整行($0)，除了第一个文件 (FNR!=NR) 之外的所有文件中，我确实删除了与之前从 arr 中以相同方式制作的密钥使用delete，请注意，使用delete与不存在的键没有任何问题。处理所有文件后，我使用 for 循环从 arr 打印值。这可能与两个以上的文件一起使用，在这种情况下，效果与在输入第二个和下一个文件之前将它们连接在一起相同。 免责声明此解决方案假设您不关心输出中的行顺序

（在 gawk 4.2.1 中测试）

You want what is called difference in set arithmetics parlance. I would use GNU AWK for this task as follows, let file1.txt content be

fruits      apple    50
fruits      mango    60
fruits      kiwi     35
vegetable   tomato   20
vegetable   brinjal  30

and file2.txt content be

fruits      apple    50
fruits      orange   25
vegetable   tomato   20
vegetable   potato   25
sauce       chilly   78

then

awk '(FNR==NR){arr[$1,$2]=$0}(FNR!=NR){delete arr[$1,$2]}END{for(i in arr){print arr[i]}}' file1.txt file2.txt

output

fruits      kiwi     35
fruits      mango    60
vegetable   brinjal  30

Explanation: I use array arr, when going through first file (FNR==NR) I simply set arr's value for key being concatenation of 1st and 2nd column to whole line ($0), in all but first file(s) (FNR!=NR) I do remove key made same way as earlier from arr using delete, note that there is nothing wrong with using delete with key which does not exist. When all files are processed I do print values from arr using for loop. This might be used with more than 2 files, in which case effect would be same as concating 2nd and next files together before feeding them. DISCLAIMER this solution assumes that you do not care about rows order in output