awk 中的字段分隔符可以包含多个字符吗?
我可以使用由多个字符组成的字段分隔符吗?就像我想分隔其中包含引号和逗号的单词即。
"School","College","City"
所以这里我想将我的 FS 设置为“,”。但当我这样定义 FS 时,我得到了有趣的结果。这是我的代码片段。
awk -F\",\" '
{
for(i=1;i<=NF;i++)
{
if($i~"[a-z0-9],[a-z0-9]")
print $i
}
}' OFS=\",\" $*
Can I use a field separator consisting of multiple characters? Like I want to separate words which contain quotes and commas between them viz.
"School","College","City"
So here I want to set my FS to be ",". But I am getting funny results when I define my FS like that. Here's a snippet of my code.
awk -F\",\" '
{
for(i=1;i<=NF;i++)
{
if($i~"[a-z0-9],[a-z0-9]")
print $i
}
}' OFS=\",\" $*
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
是的,FS可以是多个字符。请参阅以下示例的测试:
yes, FS could be multi-characters. see the below test with your example:
这里讨论的是字段分隔符不仅限于多个字符,而且实际上可以是一个成熟的正则表达式。
也就是说:
这会从 XML 片段中去除标头和周围的标签。
请注意,标签格式良好,但有所不同。
现在我们应用 awk 脚本来打印中间字段,使用正则表达式作为字段分隔符:
空白行来自标记是该行上唯一内容的位置,因此没有 $2 可以打印。
这实际上非常强大,因为它意味着您不仅可以使用具有多个字符的固定模式,还可以在字段分隔符中使用正则表达式的全部功能。
What's being talked around here is that the Field Separator isn't just limited to being multiple characters but can actually be a full-blown regex.
To wit:
This strips out the header and surrounding tags from an XML fragment.
Note that tags are well-formed, but different.
Now we apply the awk script to print out the middle field, using a regex as the field separator:
The blank lines are from where a tag was the only thing on that line, so there is no $2 to print.
This is actually really powerful because it means that you can not only use fixed patterns with multiple characters but the full power of regular expressions as well in your field separator.
尝试
Try
是的,您可以对
-F
参数使用多个字符,因为该值可以是正则表达式。例如,您可以执行以下操作:这将返回
friend
。对于
nawk
和gawk
(GNU awk)(原始的awk
),支持将 regexp 作为-F
的参数。代码> 不支持。在 Solaris 上,这种区别很重要,在 Linux 上则不重要,因为awk
实际上是到gawk
的链接。因此,我认为最好的做法是将 awk 调用为 gawk,因为这样它就可以跨平台工作。Yes, you can use multiple characters for the
-F
argument because that value can be a regular expression. For example you can do things like:which will return
friend
.The support for regexp as the argument to
-F
is true fornawk
andgawk
(GNU awk), the originalawk
does not support it. On Solaris this distinction is important, on Linux it is not important becauseawk
is effectively a link togawk
. I would therefore say it is best practice to invoke awk asgawk
because then it will work across platforms.使用 GNU awk 4,您甚至可以轻松解析带有嵌入分隔符和引号的 *CSV*:
With GNU awk 4 you can easily parse even *CSV*s with embedded separators and quotes:
要使用 awk 分隔多个字符并精确地使用 "," 分隔,您可以在字符前添加 \\:
https://es.stackoverflow.com/questions/422811/unix-awk-separaci%c3%b3n-de-campos-por-grupo-de-caracteres/423081#423081
To separate by multiple character using awk and exactly by "," you can add \\ before the characters:
https://es.stackoverflow.com/questions/422811/unix-awk-separaci%c3%b3n-de-campos-por-grupo-de-caracteres/423081#423081