sed/awk +正则表达式删除第一个字段匹配的重复行(IP地址)
我需要一个解决方案来删除第一个字段是 IPv4 地址的重复行。例如,我在文件中有以下几行:
192.168.0.1/text1/text2
192.168.0.18/text03/text7
192.168.0.15/sometext/sometext
192.168.0.1/text100/ntext
192.168.0.23/othertext/sometext
因此,在前面的场景中它匹配的所有内容都是 IP 地址。我所知道的是,IP 地址的正则表达式是:
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
如果解决方案是一行且尽可能快,那就太好了。
I need a solution to delete duplicate lines where first field is an IPv4 address.For example I have the following lines in a file:
192.168.0.1/text1/text2
192.168.0.18/text03/text7
192.168.0.15/sometext/sometext
192.168.0.1/text100/ntext
192.168.0.23/othertext/sometext
So all it matches in the previous scenario is the IP address. All I know is that the regex for IP address is:
\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b
It would be nice if the solution is one line and as fast as possible.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果文件仅以您显示的格式包含行,即第一个字段始终是IP地址,则您可以使用1行awk:
编辑:这将删除基于仅的重复项关于 IP 地址。我不确定这是我写这个答案时OP想要的。
If, the file contains lines only in the format you show, i.e. first field is always IP address, you can get away with 1 line of awk:
EDIT: This removes duplicates based only on IP address. I'm not sure this is what the OP wanted when I wrote this answer.
如果您不需要保留原始顺序,一种方法是使用
sort
:If you don't need to preserve the original ordering, one way to do this is using
sort
:ArjunShankar 发布的 awk 对我来说创造了奇迹。
我有一个巨大的项目列表,其中字段 1 中有多个副本,字段 2 中有一个特殊的序列号。我需要每个唯一字段 1 中的“最新”或最高序列号。
我必须使用 sort -rn 来推送它们直到“第一个条目”位置,因为第一步是写入,然后比较下一个条目,而不是获取列表中的最后一个/最近的条目。
感谢阿琼·香卡!
The awk that ArjunShankar posted worked wonders for me.
I had a huge list of items, which had multiple copies in field 1, and a special sequential number in field 2. I needed the "newest" or highest sequential number from each unique field 1.
I had to use sort -rn to push them up to the "first entry" position, as the first step is write, then compare the next entry, as opposed to getting the last/most recent in the list.
Thank ArjunShankar!