AWK 脚本 - 该脚本有什么作用?
我需要重复处理此 AWK 脚本,但无法弄清楚它在做什么。谁能告诉我这个脚本的基本功能是什么?
它需要一个输入文件,并创建一个输出文件,但我无法访问这两个文件来查看它在做什么。它与分隔输入文件中的列的管道分隔符有关。
{
if (NR == 1) {
line = $0
len = length(line)
newlen = len
while ( substr(line,newlen-1,1) == "|" )
{
newlen = newlen - 1
}
line = substr(line,1,newlen-1)
}
else {
print line
line = $0
}
}
END{
len = length(line)
newlen = len
while ( substr(line,newlen-1,1) == "|" ) {
newlen = newlen - 1
}
line = substr(line,1,newlen-1)
print line
}
I need to duplicate processing of this AWK script but cannot figure out what it is doing. Can anyone please advise what the basic function of this script is?
It takes an input file, and creates an output file but I do not have access to either files to see what it is doing. It has something to do with the pipe delimiter which delimits columns in the input file.
{
if (NR == 1) {
line = $0
len = length(line)
newlen = len
while ( substr(line,newlen-1,1) == "|" )
{
newlen = newlen - 1
}
line = substr(line,1,newlen-1)
}
else {
print line
line = $0
}
}
END{
len = length(line)
newlen = len
while ( substr(line,newlen-1,1) == "|" ) {
newlen = newlen - 1
}
line = substr(line,1,newlen-1)
print line
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
看起来它只修剪第一行和最后一行上的所有尾随管道字符。
it looks like it's trimming all trailing pipe chars on the first and last lines only.
哇,写这篇文章的人一定是通过线路付费的。
出现两次的代码块,从
len = length(line)
到line = substr(line,1,newlen-1)
,正在执行字符串转换,可以简单地(并且更清楚地)表达为正则表达式替换。它计算line
末尾的|
字符数并剥离它们。当该行以|
以外的字符结尾时,一个字符将被删除(这可能是意外的)。这可以简单地执行为gsub(/(\|+|.)$/, "", line)
或gsub(/\|+)$/, "", line )
如果没有最终|
的行为无关紧要。从整体结构来看,代码分为三部分:第一行做了什么(
if (NR == 1) {…}
),其他行做了什么(else { …}
),以及最后一行(END {…}
)之后执行的操作。在第一行,变量line
设置为。 $0
在随后的行中,已保存。打印line
然后将line
设置为当前行,最后打印并转换这种 print-previous-then-save-current 模式是一个常见的技巧。在最后一行采取不同的行为:当您读取一行时,您无法知道它是否是最后一行,因此您保存它,打印上一行并在END
块中继续;对最后一行做不同的事情这就是我的做法 。数据流同样不平凡(但也很难设计),但至少它没有淹没在混乱的文本转换中。
Wow, whoever wrote this must have been paid by the line.
The block of code that occurs twice, from
len = length(line)
toline = substr(line,1,newlen-1)
, is doing a string transformation that could be simply (and more clearly) expressed as a regular expression replacement. It's calculating the number of|
characters at the end ofline
and stripping them. When the line ends with a character other than|
, one character is stripped (this may be accidental). This could be simply performed asgsub(/(\|+|.)$/, "", line)
, orgsub(/\|+)$/, "", line)
if the behavior with no final|
doesn't matter.As for the overall structure, there are three parts in the code: what's done for the first line (
if (NR == 1) {…}
, what's done for other lines (else {…}
), and what's done after the last line (END {…}
). On the first line, the variableline
is set to$0
transformed. On subsequent lines, the savedline
is printed thenline
is set to the current line. Finally the last line is printed, transformed. This print-previous-then-save-current pattern is a common trick to act differently on the last line: when you read a line, you can't know whether it's the last one, so you save it, print the previous line and move on; in theEND
block you do that different thing for the last line.Here's how I'd write it. The data flow is similarly nontrivial (but hardly contrived either), but at least it's not drowned in a messy text transformation.
我可能是错的,但快速浏览一下,它似乎过滤掉了 |文件中的字符。
I may be wrong but on quick glance it seems to filter out the | caracter in a file.