AWK 脚本 - 该脚本有什么作用?

发布于 2024-11-14 06:37:28 字数 632 浏览 7 评论 0原文

我需要重复处理此 AWK 脚本,但无法弄清楚它在做什么。谁能告诉我这个脚本的基本功能是什么?

它需要一个输入文件,并创建一个输出文件,但我无法访问这两个文件来查看它在做什么。它与分隔输入文件中的列的管道分隔符有关。

{ 
   if (NR == 1) {
     line = $0
     len = length(line)
       newlen = len
     while ( substr(line,newlen-1,1) == "|" )
       {
         newlen = newlen - 1
       }
     line = substr(line,1,newlen-1)
   }
     else {
     print line
     line = $0
     }
 }
 END{
      len = length(line)
      newlen = len
    while ( substr(line,newlen-1,1) == "|" ) {
      newlen = newlen - 1
    }
    line = substr(line,1,newlen-1)
      print line
}

I need to duplicate processing of this AWK script but cannot figure out what it is doing. Can anyone please advise what the basic function of this script is?

It takes an input file, and creates an output file but I do not have access to either files to see what it is doing. It has something to do with the pipe delimiter which delimits columns in the input file.

{ 
   if (NR == 1) {
     line = $0
     len = length(line)
       newlen = len
     while ( substr(line,newlen-1,1) == "|" )
       {
         newlen = newlen - 1
       }
     line = substr(line,1,newlen-1)
   }
     else {
     print line
     line = $0
     }
 }
 END{
      len = length(line)
      newlen = len
    while ( substr(line,newlen-1,1) == "|" ) {
      newlen = newlen - 1
    }
    line = substr(line,1,newlen-1)
      print line
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

喜你已久 2024-11-21 06:37:28

看起来它只修剪第一行和最后一行上的所有尾随管道字符。

it looks like it's trimming all trailing pipe chars on the first and last lines only.

西瓜 2024-11-21 06:37:28

哇,写这篇文章的人一定是通过线路付费的。

出现两次的代码块,从 len = length(line)line = substr(line,1,newlen-1),正在执行字符串转换,可以简单地(并且更清楚地)表达为正则表达式替换。它计算 line 末尾的 | 字符数并剥离它们。当该行以 | 以外的字符结尾时,一个字符将被删除(这可能是意外的)。这可以简单地执行为 gsub(/(\|+|.)$/, "", line)gsub(/\|+)$/, "", line ) 如果没有最终 | 的行为无关紧要。

从整体结构来看,代码分为三部分:第一行做了什么(if (NR == 1) {…}),其他行做了什么(else { …}),以及最后一行(END {…})之后执行的操作。在第一行,变量 line 设置为 。 $0 在随后的行中,已保存。打印 line 然后将 line 设置为当前行,最后打印并转换这种 print-previous-then-save-current 模式是一个常见的技巧。在最后一行采取不同的行为:当您读取一行时,您无法知道它是否是最后一行,因此您保存它,打印上一行并在 END 块中继续;对最后一行做不同的事情

这就是我的做法 。数据流同样不平凡(但也很难设计),但至少它没有淹没在混乱的文本转换中。

function cleanup (line) { gsub(/(\|+|.)$/, "", line); return line }
NR != 1 { print prev }
{ prev = (NR == 1 ? cleanup($0) : $0) }
END { print cleanup(prev) }

Wow, whoever wrote this must have been paid by the line.

The block of code that occurs twice, from len = length(line) to line = substr(line,1,newlen-1), is doing a string transformation that could be simply (and more clearly) expressed as a regular expression replacement. It's calculating the number of | characters at the end of line and stripping them. When the line ends with a character other than |, one character is stripped (this may be accidental). This could be simply performed as gsub(/(\|+|.)$/, "", line), or gsub(/\|+)$/, "", line) if the behavior with no final | doesn't matter.

As for the overall structure, there are three parts in the code: what's done for the first line (if (NR == 1) {…}, what's done for other lines (else {…}), and what's done after the last line (END {…}). On the first line, the variable line is set to $0 transformed. On subsequent lines, the saved line is printed then line is set to the current line. Finally the last line is printed, transformed. This print-previous-then-save-current pattern is a common trick to act differently on the last line: when you read a line, you can't know whether it's the last one, so you save it, print the previous line and move on; in the END block you do that different thing for the last line.

Here's how I'd write it. The data flow is similarly nontrivial (but hardly contrived either), but at least it's not drowned in a messy text transformation.

function cleanup (line) { gsub(/(\|+|.)$/, "", line); return line }
NR != 1 { print prev }
{ prev = (NR == 1 ? cleanup($0) : $0) }
END { print cleanup(prev) }
一曲琵琶半遮面シ 2024-11-21 06:37:28

我可能是错的,但快速浏览一下,它似乎过滤掉了 |文件中的字符。

I may be wrong but on quick glance it seems to filter out the | caracter in a file.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文