awk 根据公共字段合并两个文件并打印异同
我有两个文件,我想合并到第三个文件中,但我需要查看它们何时共享公共字段以及它们的不同之处。由于其他字段存在细微差别,我无法使用 diff 工具,我认为这可能是用awk完成。
文件1:
aWonderfulMachine 1 mlqsjflk
AnotherWonderfulMachine 2 mlksjf
YetAnother WonderfulMachine 3 sdg
TrashWeWon'tBuy 4 jhfgjh
MoreTrash 5 qsfqf
MiscelleneousStuff 6 qfsdf
MoreMiscelleneousStuff 7 qsfwsf
文件2:
aWonderfulMachine 22 dfhdhg
aWonderfulMachine 23 dfhh
aWonderfulMachine 24 qdgfqf
AnotherWonderfulMachine 25 qsfsq
AnotherWonderfulMachine 26 qfwdsf
MoreDifferentStuff 27 qsfsdf
StrangeStuffBought 28 qsfsdf
所需的输出:
aWonderfulMachine 1 mlqsjflk aWonderfulMachine 22 dfhdhg
aWonderfulMachine 23 dfhdhg
aWonderfulMachine 24 dfhh
AnotherWonderfulMachine 2 mlksjf AnotherWonderfulMachine 25 qfwdsf
AnotherWonderfulMachine 26 qfwdsf
File1
YetAnother WonderfulMachine 3 sdg
TrashWeWon'tBuy 4 jhfgjh
MoreTrash 5 qsfqf
MiscelleneousStuff 6 qfsdf
MoreMiscelleneousStuff 7 qsfwsf
File2
MoreDifferentStuff 27 qsfsdf
StrangeStuffBought 28 qsfsdf
我在这里和那里尝试了一些awks脚本,但它们要么仅基于两个字段,并且我不知道如何修改输出,要么它们根据两个字段删除重复项仅等(我对此很陌生,而且 awk 语法很困难)。 预先非常感谢您的帮助。
I have two files I would like to merge into a third but I need to see both when they share a common field and where they differ.Since there are minor differences in other fields, I cannot use a diff tool and I thought this could be done with awk.
File 1:
aWonderfulMachine 1 mlqsjflk
AnotherWonderfulMachine 2 mlksjf
YetAnother WonderfulMachine 3 sdg
TrashWeWon'tBuy 4 jhfgjh
MoreTrash 5 qsfqf
MiscelleneousStuff 6 qfsdf
MoreMiscelleneousStuff 7 qsfwsf
File2:
aWonderfulMachine 22 dfhdhg
aWonderfulMachine 23 dfhh
aWonderfulMachine 24 qdgfqf
AnotherWonderfulMachine 25 qsfsq
AnotherWonderfulMachine 26 qfwdsf
MoreDifferentStuff 27 qsfsdf
StrangeStuffBought 28 qsfsdf
Desired output:
aWonderfulMachine 1 mlqsjflk aWonderfulMachine 22 dfhdhg
aWonderfulMachine 23 dfhdhg
aWonderfulMachine 24 dfhh
AnotherWonderfulMachine 2 mlksjf AnotherWonderfulMachine 25 qfwdsf
AnotherWonderfulMachine 26 qfwdsf
File1
YetAnother WonderfulMachine 3 sdg
TrashWeWon'tBuy 4 jhfgjh
MoreTrash 5 qsfqf
MiscelleneousStuff 6 qfsdf
MoreMiscelleneousStuff 7 qsfwsf
File2
MoreDifferentStuff 27 qsfsdf
StrangeStuffBought 28 qsfsdf
I have tried a few awks scripts here and there, but they are either based on two fields only, and I don't know how to modify the output, or they delete the duplicates based on two fields only, etc (I am new to this and awk syntax is tough).
Thank you much in advance for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
使用这三个命令可以非常接近:
这假设有一个 shell,例如 Bash,支持进程替换 (
<()
)。如果您使用的 shell 不支持,则需要对文件进行预先排序。要在 AWK 中执行此操作:
要运行它:
这些行将不会按照它们在输入文件中出现的顺序输出。第二个输入文件 (file2) 需要排序,因为脚本假定相似的行是相邻的。您可能需要调整脚本中的制表符或其他间距。我在这方面并没有做太多的事情。
You can come very close using these three commands:
This assumes a shell, such as Bash, that supports process substitution (
<()
). If you're using a shell that doesn't, the files would need to be pre-sorted.To do this in AWK:
To run it:
The lines won't be output in the same order that they appear in the input files. The second input file (file2) needs to be sorted since the script assumes that similar lines are adjacent. You will probably want to adjust the tabs or other spacing in the script. I haven't done much in that regard.
一种方法(尽管使用硬编码的文件名):
不是特别优雅,但它可以处理多对多的情况。
One way to do it (albeit with hardcoded file names):
Not particularly elegant, but it handles many-to-many cases.