比较每隔一行,打印后面的行但删除重复项
我有一个格式为以下的文件:(
id-of-item
description of item
id-of-item
description of item
id-of-item
description of item
id-of-item
description of item
id-of-item
description of item
每个之间只有一行,这里只有大空格)
我需要比较项目的描述,如果它们匹配,则删除该描述但保留 id(我需要制作一个引用的表格id 作为组)
我不知道如何做到这一点,我尝试了几个 awk 与 NR%2 和 uniq 等,但显然所有都只匹配一个而不是另一个=/
I have a file in the format of:
id-of-item
description of item
id-of-item
description of item
id-of-item
description of item
id-of-item
description of item
id-of-item
description of item
(only one line between each, just big spaces here)
I need to compare the descriptions of items and if they match, remove that description but keep the id (i need to make a table that references the ids as groups)
I have no idea how to do this, i have tried a couple of awk with NR%2 and uniq etc but obviously all have only matched one and not the other =/
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这可能很接近。 awk 的规则是,
将您想要消除重复的任何内容放入数组索引中:
感受关联数组的强大功能。
This might be close. The rule of awk is,
put whatever you want to kill duplication into index of array:
Feel the power of Associative Array.
我将做出两个简化的假设:
这两个假设都不是很强,因此如果需要,调整以下内容应该不难。
有了这些假设,我将使用 printf "1\n\nitem 1\n\n2\n\nitem 2\n\n3\n\nitem 2\n\n4\n\nitem 1\ 生成样本数据n”。它看起来像这样:
要处理这些数据,我将:
这是一个管道这样做:
通过管道传输示例数据,您将得到
I'm going to make two simplifying assumptions:
Neither assumption is very strong, so it shouldn't be hard to adapt the following if needed.
With those assumptions, I'll produce sample data with
printf "1\n\nitem 1\n\n2\n\nitem 2\n\n3\n\nitem 2\n\n4\n\nitem 1\n"
. It looks like this:To process this data, I'll:
Here's a pipeline that does it:
Pipe the sample data through it, and you get
这可能对您有帮助(?):
如果您想删除描述:
说明:
一次读取
input.txt
2 行,将换行符\n
替换为分隔符 (这是!!!
)。排序并删除重复项。将分隔符!!!
替换为换行符\n
。或者完全删除描述。编辑:
这可能对你有用(?):
This might help you(?):
If you want to remove the description:
Explanation:
Read
input.txt
2 lines at a time replacing the newline\n
with a delimiter (here it is!!!
). Sort and remove duplicates. Replace the delimiter!!!
by a newline\n
. Or remove the description altogether.EDIT:
This might work for you(?):
这行得通吗?
您的文件:
执行:
Would this work?
Your File:
Execution: