AWK/BASH:如何从具有已知字段范围的文件中删除重复行?

发布于 2024-09-28 12:24:42 字数 630 浏览 5 评论 0原文

我想知道是否有一种方法可以使用 bash/awk 根据已知的字段范围删除重复的行。例如:

Easy Going                  USA:22 May 1926
Easy Going Gordon               USA:6 August 1925   
Easy Life                   USA:20 May 1944
Easy Listening                  USA:14 January 2002 
Easy Listening                  USA:10 October 2002 
Easy Listening                  USA:27 January 2004 
Easy Living                     USA:7 July 1937 
Easy Living                     USA:16 July 1937
Easy Living                     USA:4 September 2009

我想删除重复的移动标题。电影标题始终从 $1 到 $(NF-3)。理想情况下,我想坚持第一次出现(最早的日期),但如果不可能,那就没关系。

谢谢,

托梅克

I was wondering if there was a way to use bash/awk to remove duplicate rows based on a known field range. For example:

Easy Going                  USA:22 May 1926
Easy Going Gordon               USA:6 August 1925   
Easy Life                   USA:20 May 1944
Easy Listening                  USA:14 January 2002 
Easy Listening                  USA:10 October 2002 
Easy Listening                  USA:27 January 2004 
Easy Living                     USA:7 July 1937 
Easy Living                     USA:16 July 1937
Easy Living                     USA:4 September 2009

I would like to remove duplicate move titles. The movie title will always be from $1 through $(NF-3). Ideally I would like to stick with the first occurrence (earliest date), but if that's not possible then it doesn't matter.

Thanks,

Tomek

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

拥抱没勇气 2024-10-05 12:24:42
#!/bin/bash

awk 'BEGIN{
   m=split("January|February|March|April|May|June|July|August|September|October|November|December",d,"|")
   for(o=1;o<=m;o++){
      months[d[o]]=sprintf("%02d",o)
   }
}
{
   sub(/.*:/,"",$(NF-2))
   t=mktime($(NF)" "months[$(NF-1)]" "$(NF-2)" 0 0 0")
   time[t]=$(NF-2) FS $(NF-1) FS $(NF)
   $(NF-2)=$(NF-1)=$(NF)=""
   gsub(/ +$/,"")
   if (!($0 in array)){array[$0]=99999999999999}
   if ( t <= array[$0] ){ array[$0]=t }
}
END{
  for(i in array){ print "->",i,time[array[i]]  }
} ' file

输出

$ ./shell.sh
-> Easy Living 7 July 1937
-> Easy Going Gordon 6 August 1925
-> Easy Listening 14 January 2002
-> Easy Going 22 May 1926
-> Easy Life 20 May 1944
#!/bin/bash

awk 'BEGIN{
   m=split("January|February|March|April|May|June|July|August|September|October|November|December",d,"|")
   for(o=1;o<=m;o++){
      months[d[o]]=sprintf("%02d",o)
   }
}
{
   sub(/.*:/,"",$(NF-2))
   t=mktime($(NF)" "months[$(NF-1)]" "$(NF-2)" 0 0 0")
   time[t]=$(NF-2) FS $(NF-1) FS $(NF)
   $(NF-2)=$(NF-1)=$(NF)=""
   gsub(/ +$/,"")
   if (!($0 in array)){array[$0]=99999999999999}
   if ( t <= array[$0] ){ array[$0]=t }
}
END{
  for(i in array){ print "->",i,time[array[i]]  }
} ' file

output

$ ./shell.sh
-> Easy Living 7 July 1937
-> Easy Going Gordon 6 August 1925
-> Easy Listening 14 January 2002
-> Easy Going 22 May 1926
-> Easy Life 20 May 1944
一江春梦 2024-10-05 12:24:42
awk '
    {
        line = $0
        $(NF-2) = $(NF-1) = $NF = ""
        if ( ! ($0 in movies)) 
            movies[$0] = line
    }
    END {
        for (m in movies) print movies[m] 
    }
' movies.txt 

这不会保留原始的行顺序。您可能想要对输出进行排序

awk '
    {
        line = $0
        $(NF-2) = $(NF-1) = $NF = ""
        if ( ! ($0 in movies)) 
            movies[$0] = line
    }
    END {
        for (m in movies) print movies[m] 
    }
' movies.txt 

That does not preserve the original line ordering. You might want to sort the output.

陌若浮生 2024-10-05 12:24:42

这可能是一个快速答案

sort -t':' -k1,1 -u your-file

This could be a quick answer

sort -t':' -k1,1 -u your-file
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文