gawk 字段中的反斜杠
我刚刚被迫使用 gawk 检查所有输出文件,我尽可能避免这样做。 此有何
gawk 'NF \!= 6' file
与
gawk 'NF != 6' file
不同的是,反斜杠如何改变该表达式的含义?
它是否应该输出字段数不同于 6 且以反斜杠结尾的行?
我的文件出现以下错误:
gawk: ^ backslash not last character on line
有人吗?
I've just been made into checking all my output files with gawk which I avoid as much as I can.
How does
gawk 'NF \!= 6' file
differ from
gawk 'NF != 6' file
that is, how does the backslash change the meaning of this expression?
Should it output lines with number of fields different than 6 and ending with backslash?
I'm getting the following error on my files:
gawk: ^ backslash not last character on line
Anybody?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果您使用双引号而不是单引号,则
!
是一个特殊字符,应使用反斜杠转义。重要的是,您要转义感叹号,以便您的shell看不到它。在双引号内,shell 在将参数传递给 gawk 之前会将
\!
转换为!
。当 gawk 被调用时,反斜杠就消失了。但是,对于单引号,shell 将忽略
!
字符,因此无需使用反斜杠对其进行转义。事实上,正如您发现的那样,这样做是一个语法错误,因为反斜杠最终被传递给 gawk,而 gawk 会遇到意外的\
。If you use double quotes instead of single quotes then
!
is a special character and should be escaped with a backslash. Importantly, you are escaping the exclamation point so that your shell does not see it.Within double quotes the shell will convert
\!
to!
before passing the argument to gawk. The backslash is gone by the time gawk is invoked.With single qutoes, though, the shell will ignore
!
characters, so there's no need to escape them with backslashes. In fact, as you found out it is a syntax error to do so since the backslash ends up being passed to gawk, which barfs on the unexpected\
.不带反斜杠的行按预期工作。但是,如果您想知道,反斜杠通常用于转义特殊字符(它们失去特殊含义并用作自身),也用于分割长行,因此您可以编写类似(在 shell 下)的内容
:会有同样的效果。
特别是你的例子有点棘手。您将字符串放在单引号内。这使得shell不会修改你编写的内容,并将其传递给程序。如果使用反斜杠表达式,gawk 会在没有意义的地方找到“
\
”(在 gawk 中,它仅用于分割长行和对字符串中的字符进行转义)。在我用反斜杠写成两行的示例中,gawk 接收由反斜杠分割的两行(概念上是一行)。The line without the backslash works as expected. However, if you want to know, backslash is used usually to scape special characters (they lose their special meaning and are used as themselves), and also to split long lines, so you could write something like (under a shell):
and it would have the same effect.
Your example in particular is a little bit more tricky. You put the string within single quotes. This makes the shell not to modify what you write, and pass it to the program. If you use your backslash expression, gawk will find a '
\
' in a place where it has no meaning (in gawk it is only used to split long lines and to scape characters in strings). In the example I wrote with a backslash in two lines, gawk receives two lines split by a backslash (conceptually one line).如果您尝试匹配没有 6 个字段且以反斜杠结尾的行,这是一种方法:
Gawk(和其他 AWK)有一些关于反斜杠转义的复杂规则。这就是为什么前面的命令中有四个反斜杠。 (美元符号代表数据文件中输入行的结尾,就像任何正则表达式一样。)
If you're trying to match lines that don't have 6 fields and that do end in a backslash, this is one way to do that:
Gawk (and other AWKs) have some complex rules regarding backslash escaping. That's why their are four backslashes in the preceding command. (The dollar sign represents the end of the input line from the data file as in any regex.)
无论您使用双引号还是单引号,如果您使用的是类似 Bourne 的 shell,gawk 所看到的程序将与引号之间的显示完全一样。即使在双引号中,Bourne 和类似 csh 的 shell 也只会在可能需要转义的字符之前消耗 \(例如 $,在 csh 的情况下,! - 因此,在 csh 中,该程序在语法上对 gawk 来说是正确的,尽管它仍然不会不做你想做的事)。
!在这种情况下没有任何意义,所以它会给出一个错误。要“输出字段数不等于 6 且以反斜杠结尾的行”,请使用:
gawk 'NF != 6 && /\\$/' file
即:匹配没有 6 个字段的行,并且匹配紧邻行尾 ($) 之前的 \。 \ 必须用另一个反斜杠转义,因为 gawk 也使用 \ 进行转义 - 尽管在 gawk 的情况下,所有 \(除了那些被另一个 \ 转义的)都被吸收;那些没有转义特殊字符的内容将被简单地忽略。
如果没有关联的操作,则当满足此条件语句时将执行默认操作(打印该行)。
Whether you use double or single quotes, if you are using a Bourne-like shell, gawk will see the program exactly as it appears between the quotes. Even in double quotes, both Bourne and csh-like shells only consume \ before characters that might need escaping (like $, and in the case of csh, ! - thus in csh this program would appear syntactically correct to gawk, though it still wouldn't do what you want).
! has no meaning to gawk in this context, so it gives an error. To "output lines with number of fields different than 6 and ending with backslash", use:
gawk 'NF != 6 && /\\$/' file
That is: match lines that don't have 6 fields, and which match \ immediately preceding end of line ($). The \ must be escaped with another backslash, because gawk too uses \ for escaping - though in the case of gawk, all \ (except those escaped by another \) are absorbed; those that don't escape a special character are simply elided.
With no associated action, the default action (print the line) will be taken when this conditional statement is satisfied.