尴尬 - 删除最古老的重复线条,保留最新的副本+删除上方删除的一行

发布于 2025-02-08 18:37:32 字数 497 浏览 4 评论 0原文

我有以下格式的输入:

#1655636921
cd
#1655636926
history
#1655637510
history
#1655637934
ls
#1655637934
ls
#1655638524
cd
#1655638927
ls
#1655638928
history

我想搜索重复项(在行中,不是以''开头,或者仅在偶数线上检测重复),请删除所有以前的重复项(保持每个已删除的重复删除一行的最新 +) +,所以输出看起来像这样:

#1655638524
cd
#1655638927
ls
#1655638928
history

我是新来的尴尬,即使保留了最新的重复项,我也找不到任何解决方案,这是我找到的唯一相关的解决方案IS:

awk '!visited[$0]++'

仅删除保留最古老的最新复制品。 非常感谢您的任何帮助。

I have input in a following format:

#1655636921
cd
#1655636926
history
#1655637510
history
#1655637934
ls
#1655637934
ls
#1655638524
cd
#1655638927
ls
#1655638928
history

and I would like to search for duplicates (in lines, that do not start with '#' OR detect duplicates only in even lines), delete all previous duplicates (keeping only the latest one) + for each deleted duplicate delete one previous line, so the output would look like this:

#1655638524
cd
#1655638927
ls
#1655638928
history

I am new to awk and I couldn't find any solution even with preserving latest duplicates, the only related solution that I have found is:

awk '!visited[$0]++'

Which deletes only latest duplicates preserving the oldest one.
Thank you very much in advance for any kind of help.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

少跟Wǒ拽 2025-02-15 18:37:32
$ tac file | awk '!/^#/{f = !seen[$0]++} f' | tac
#1655638524
cd
#1655638927
ls
#1655638928
history

如果您在系统上没有tac命令只是强制性POSIX工具awksort剪切

tac() { awk -v OFS='\t' '{print NR, $0}' "${@:--}" | sort -k1,1rn | cut -f2-; }

或者如果您的cat具有- n参数(非posix)或您有nl(posix但不是强制性):

tac() { nl "${@:--}" | sort -k1,1rn | cut -f2-; }
tac() { cat -n "${@:--}" | sort -k1,1rn | cut -f2-; }
$ tac file | awk '!/^#/{f = !seen[$0]++} f' | tac
#1655638524
cd
#1655638927
ls
#1655638928
history

If you don't have the tac command on your system you can create a tac function to do the same thing the command does, i.e. reverse the order of input lines, using just the mandatory POSIX tools awk, sort, and cut:

tac() { awk -v OFS='\t' '{print NR, $0}' "${@:--}" | sort -k1,1rn | cut -f2-; }

or if your cat has a -n argument (non-POSIX) or you have nl (POSIX but not mandatory):

tac() { nl "${@:--}" | sort -k1,1rn | cut -f2-; }
tac() { cat -n "${@:--}" | sort -k1,1rn | cut -f2-; }
傲鸠 2025-02-15 18:37:32

不知何故,有一个奇怪的副本挥之不去,不得不将其弄清楚蛮力:

    {m,g}awk'
    BEGIN {
     1      RS = "(\r?\n)?[#]"
     1      FS = (_="[ \t]*")"\n+"(_)
     1     OFS =  _=""
     1     ___ = "\21#"
    }    {
           ____[+__[$NF]]++
                 __[$NF] = NR ___ $+_
    } END {
     1         FS = "[0-9]+\21"
     1        OFS = ORS
     1          _ = ""
     1        $+_ = _
     1     delete  ____[_]
     1     delete ____[+_]

     4     for(_ in __) { if(!(+(___=__[_]) in ____)) {
     4         $+___=___
               sub("^[^\021]+\21[#]?","#",$+___)
     4     } }
           sub("^.+\n\n", ""); print }'

=

#1655638524
cd
#1655638927
ls
#1655638928
history

somehow there was a strange duplicate lingering and had to trim it out brute force :

    {m,g}awk'
    BEGIN {
     1      RS = "(\r?\n)?[#]"
     1      FS = (_="[ \t]*")"\n+"(_)
     1     OFS =  _=""
     1     ___ = "\21#"
    }    {
           ____[+__[$NF]]++
                 __[$NF] = NR ___ $+_
    } END {
     1         FS = "[0-9]+\21"
     1        OFS = ORS
     1          _ = ""
     1        $+_ = _
     1     delete  ____[_]
     1     delete ____[+_]

     4     for(_ in __) { if(!(+(___=__[_]) in ____)) {
     4         $+___=___
               sub("^[^\021]+\21[#]?","#",$+___)
     4     } }
           sub("^.+\n\n", ""); print }'

=

#1655638524
cd
#1655638927
ls
#1655638928
history
残疾 2025-02-15 18:37:32

假设:

  • op提及'line'的处理,这意味着lsls *.txt被视为两个不同的命令(即,两者都会显示在最终输出中)
  • op提及仅在'偶行行'中检测重复项,这意味着我们不必担心嵌套的lineFeeds(在#comment中,或命令),也不是多行#comment s

One awk消除对任何其他程序的需求的想法:

awk '
/^#/ { comment=$0; next }
     { comments[$0]=comment                   # associate previous line/comment with current command

       delete lineno2cmd[cmd2lineno[$0]]      # delete previous line number associated with this command
       lineno2cmd[FNR]=$0                     # associate the current line number with this command; this array used to generate output in line number order (ie, maintain ordering of lines)
       cmd2lineno[$0]=FNR                     # maintain reverse link from command to line number; this array used solely to make sure only one entry in lineno2cmd[] is associated with the current command
     }
END  { for (i=1;i<=FNR;i++)                   # loop through list of line numbers and ...
           if (i in lineno2cmd) {             # if line number is an index in the lineno2cmd[] array then ...
              printf "%s\n%s\n", comments[lineno2cmd[i]], lineno2cmd[i]
           }
     }
' history.dat

如果OP可以访问gnu awk(v 4.0+)(对于procinfo [“ sorted_in”]支持),我们可以简化此内容:

awk '
/^#/ { comment=$0; next }
     { comments[$0]=comment
       cmd2lineno[$0]=FNR
     }
END  { PROCINFO["sorted_in"]="@val_num_asc"     # sort array by the numerical values (ascending)
       for (i in cmd2lineno) {
           printf "%s\n%s\n", comments[i], i
           }
     }
' history.dat

这些都生成:

#1655638524
cd
#1655638927
ls
#1655638928
history

Assumptions:

  • OP mentions processing by 'line' so this means ls and ls *.txt are to be treated as two distinct commands (ie, both will show up in the final output)
  • OP mentions detecting duplicates only in 'even lines' which implies we do not need to worry about nested linefeeds (in either the #comment or the command), nor multi-line #comments

One awk idea that eliminates the need for any other programs:

awk '
/^#/ { comment=$0; next }
     { comments[$0]=comment                   # associate previous line/comment with current command

       delete lineno2cmd[cmd2lineno[$0]]      # delete previous line number associated with this command
       lineno2cmd[FNR]=$0                     # associate the current line number with this command; this array used to generate output in line number order (ie, maintain ordering of lines)
       cmd2lineno[$0]=FNR                     # maintain reverse link from command to line number; this array used solely to make sure only one entry in lineno2cmd[] is associated with the current command
     }
END  { for (i=1;i<=FNR;i++)                   # loop through list of line numbers and ...
           if (i in lineno2cmd) {             # if line number is an index in the lineno2cmd[] array then ...
              printf "%s\n%s\n", comments[lineno2cmd[i]], lineno2cmd[i]
           }
     }
' history.dat

If OP has access to GNU awk (v 4.0+) (for PROCINFO["sorted_in"] support) we can streamline this a bit:

awk '
/^#/ { comment=$0; next }
     { comments[$0]=comment
       cmd2lineno[$0]=FNR
     }
END  { PROCINFO["sorted_in"]="@val_num_asc"     # sort array by the numerical values (ascending)
       for (i in cmd2lineno) {
           printf "%s\n%s\n", comments[i], i
           }
     }
' history.dat

These both generate:

#1655638524
cd
#1655638927
ls
#1655638928
history
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文