awk 中的字段分隔符可以包含多个字符吗?

发布于 2024-12-18 06:05:14 字数 278 浏览 3 评论 0原文

我可以使用由多个字符组成的字段分隔符吗?就像我想分隔其中包含引号和逗号的单词即。

"School","College","City"

所以这里我想将我的 FS 设置为“,”。但当我这样定义 FS 时,我得到了有趣的结果。这是我的代码片段。

awk -F\",\" '
{
for(i=1;i<=NF;i++)
  {
    if($i~"[a-z0-9],[a-z0-9]") 
    print $i
  }
}' OFS=\",\"  $* 

Can I use a field separator consisting of multiple characters? Like I want to separate words which contain quotes and commas between them viz.

"School","College","City"

So here I want to set my FS to be ",". But I am getting funny results when I define my FS like that. Here's a snippet of my code.

awk -F\",\" '
{
for(i=1;i<=NF;i++)
  {
    if($i~"[a-z0-9],[a-z0-9]") 
    print $i
  }
}' OFS=\",\"  $* 

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

深府石板幽径 2024-12-25 06:05:15

是的,FS可以是多个字符。请参阅以下示例的测试:

kent$  echo '"School","College","City"'|awk -F'","|^"|"
 '{for(i=1;i<=NF;i++){if($i)print $i}}'
School
College
City

yes, FS could be multi-characters. see the below test with your example:

kent$  echo '"School","College","City"'|awk -F'","|^"|"
 '{for(i=1;i<=NF;i++){if($i)print $i}}'
School
College
City
离不开的别离 2024-12-25 06:05:15

这里讨论的是字段分隔符不仅限于多个字符,而且实际上可以是一个成熟的正则表达式。

也就是说:
这会从 XML 片段中去除标头和周围的标签。
请注意,标签格式良好,但有所不同。

bash-3.2$ more xml_example 
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
                  http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">
<url>
<loc>http://www.foo.com/about.html</loc>
<lastmod>2006-05-15T13:43:37Z</lastmod>
<priority>0.5000</priority>
</url>
<url>
<loc>http://www.foo.com/articles/articles.html</loc>
<lastmod>2006-06-20T23:03:36Z</lastmod>
<priority>0.5000</priority>
</url>

现在我们应用 awk 脚本来打印中间字段,使用正则表达式作为字段分隔符:

bash-3.2$ awk -F"<(/?)[a-z]+>" '{print $2}' <xml_example




http://www.foo.com/about.html
2006-05-15T13:43:37Z
0.5000


http://www.foo.com/articles/articles.html
2006-06-20T23:03:36Z
0.5000

bash-3.2$

空白行来自标记是该行上唯一内容的位置,因此没有 $2 可以打印。
这实际上非常强大,因为它意味着您不仅可以使用具有多个字符的固定模式,还可以在字段分隔符中使用正则表达式的全部功能。

What's being talked around here is that the Field Separator isn't just limited to being multiple characters but can actually be a full-blown regex.

To wit:
This strips out the header and surrounding tags from an XML fragment.
Note that tags are well-formed, but different.

bash-3.2$ more xml_example 
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.google.com/schemas/sitemap/0.84"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84
                  http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">
<url>
<loc>http://www.foo.com/about.html</loc>
<lastmod>2006-05-15T13:43:37Z</lastmod>
<priority>0.5000</priority>
</url>
<url>
<loc>http://www.foo.com/articles/articles.html</loc>
<lastmod>2006-06-20T23:03:36Z</lastmod>
<priority>0.5000</priority>
</url>

Now we apply the awk script to print out the middle field, using a regex as the field separator:

bash-3.2$ awk -F"<(/?)[a-z]+>" '{print $2}' <xml_example




http://www.foo.com/about.html
2006-05-15T13:43:37Z
0.5000


http://www.foo.com/articles/articles.html
2006-06-20T23:03:36Z
0.5000

bash-3.2$

The blank lines are from where a tag was the only thing on that line, so there is no $2 to print.
This is actually really powerful because it means that you can not only use fixed patterns with multiple characters but the full power of regular expressions as well in your field separator.

桃扇骨 2024-12-25 06:05:15

尝试

awk 'BEGIN{FS="[|,:]"}{print $1}' youFile

Try

awk 'BEGIN{FS="[|,:]"}{print $1}' youFile
感悟人生的甜 2024-12-25 06:05:15

是的,您可以对 -F 参数使用多个字符,因为该值可以是正则表达式。例如,您可以执行以下操作:

echo "hello:::my:::friend" | gawk -F':::' '{print $3}'

这将返回 friend

对于 nawkgawk (GNU awk)(原始的 awk),支持将 regexp 作为 -F 的参数。代码> 不支持。在 Solaris 上,这种区别很重要,在 Linux 上则不重要,因为 awk 实际上是到 gawk 的链接。因此,我认为最好的做法是将 awk 调用为 gawk,因为这样它就可以跨平台工作。

Yes, you can use multiple characters for the -F argument because that value can be a regular expression. For example you can do things like:

echo "hello:::my:::friend" | gawk -F':::' '{print $3}'

which will return friend.

The support for regexp as the argument to -F is true for nawk and gawk (GNU awk), the original awk does not support it. On Solaris this distinction is important, on Linux it is not important because awk is effectively a link to gawk. I would therefore say it is best practice to invoke awk as gawk because then it will work across platforms.

北陌 2024-12-25 06:05:15

使用 GNU awk 4,您甚至可以轻松解析带有嵌入分隔符和引号的 *CSV*:

% cat infile 
"School",College: "My College","City, I"

% awk '{    
  for (i = 0; ++i <= NF;)
    print i, substr($i, 1, 1) == "\042" ?
      substr($i, 2, length($i) - 2) : $i
  }' FPAT='([^,]+)|(\"[^\"]+\")' infile  
1 School
2 College: "My College"
3 City, I

With GNU awk 4 you can easily parse even *CSV*s with embedded separators and quotes:

% cat infile 
"School",College: "My College","City, I"

% awk '{    
  for (i = 0; ++i <= NF;)
    print i, substr($i, 1, 1) == "\042" ?
      substr($i, 2, length($i) - 2) : $i
  }' FPAT='([^,]+)|(\"[^\"]+\")' infile  
1 School
2 College: "My College"
3 City, I
情场扛把子 2024-12-25 06:05:15

要使用 awk 分隔多个字符并精确地使用 "," 分隔,您可以在字符前添加 \\

echo '"School","College","City"'|awk -F'\\\\"\\\\,\\\\"' '{for(i=1;i<=NF;i++){if($i)print $i}}'

https://es.stackoverflow.com/questions/422811/unix-awk-separaci%c3%b3n-de-campos-por-grupo-de-caracteres/423081#423081

To separate by multiple character using awk and exactly by "," you can add \\ before the characters:

echo '"School","College","City"'|awk -F'\\\\"\\\\,\\\\"' '{for(i=1;i<=NF;i++){if($i)print $i}}'

https://es.stackoverflow.com/questions/422811/unix-awk-separaci%c3%b3n-de-campos-por-grupo-de-caracteres/423081#423081

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文