使用 awk 忽略转义分隔符(逗号)?

发布于 2024-08-05 09:26:27 字数 146 浏览 1 评论 0原文

如果我有一个带有转义逗号的字符串,如下所示:

a,b,{c\,d\,e},f,g

我如何使用 awk 将其解析为以下项目?

a
b
{c\,d\,e}
f
g

If I had a string with escaped commas like so:

a,b,{c\,d\,e},f,g

How might I use awk to parse that into the following items?

a
b
{c\,d\,e}
f
g

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

执手闯天涯 2024-08-12 09:26:27
{
   split($0, a, /,/)
   j=1
   for(i=1; i<=length(a); ++i) {
      if(match(b[j], /\\$/)) {
         b[j]=b[j] "," a[i]
      } else {
         b[++j] = a[i]
      }
   }
   for(k=2; k<=length(b); ++k) {
      print b[k]
   }
}
  1. 拆分为数组 a,使用 ',' 作为分隔符
  2. a 构建数组 b,合并以 结尾的行'\'
  3. 打印数组 b (注意:从 2 开始,因为第一项为空)

此解决方案假定(目前)','是唯一用 '\' 转义的字符 - 也就是说,不需要处理输入中的任何 \\,也不需要处理诸如 < 之类的奇怪组合代码>\\\,\\,\\\\,,\,。

{
   split($0, a, /,/)
   j=1
   for(i=1; i<=length(a); ++i) {
      if(match(b[j], /\\$/)) {
         b[j]=b[j] "," a[i]
      } else {
         b[++j] = a[i]
      }
   }
   for(k=2; k<=length(b); ++k) {
      print b[k]
   }
}
  1. Split into array a, using ',' as delimiter
  2. Build array b from a, merging lines that end in '\'
  3. Print array b (Note: Starts at 2 since first item is blank)

This solution presumes (for now) that ',' is the only character that is ever escaped with '\'--that is, there is no need to handle any \\ in the input, nor weird combinations such as \\\,\\,\\\\,,\,.

握住你手 2024-08-12 09:26:27
{
  gsub("\\\\,", "!Q!")
  n = split($0, a, ",")
  for (i = 1; i <= n; ++i) {
    gsub("!Q!", "\\,", a[i])
    print a[i]
  }
}
{
  gsub("\\\\,", "!Q!")
  n = split($0, a, ",")
  for (i = 1; i <= n; ++i) {
    gsub("!Q!", "\\,", a[i])
    print a[i]
  }
}
烟花肆意 2024-08-12 09:26:27

我不认为 awk 对这样的事情有任何内置支持。这里有一个解决方案,它不像 DigitalRoss 的那么短,但应该不会有意外碰到你的编弦的危险(!Q!)。由于它使用 if 进行测试,因此您还可以扩展它以小心字符串末尾是否确实有 \\,,这应该是转义斜杠, 不是逗号。

BEGIN {
    FS = ","
}

{
    curfield=1
    for (i=1; i<=NF; i++) {
        if (substr($i,length($i)) == "\\") {
            fields[curfield] = fields[curfield] substr($i,1,length($i)-1) FS
        } else {
            fields[curfield] = fields[curfield] $i
            curfield++
        }
    }
    nf = curfield - 1
    for (i=1; i<=nf; i++) {
        printf("%d: %s   ",i,fields[i])
    }
    printf("\n")
}

I don't think awk has any built-in support for something like this. Here's a solution that's not nearly as short as DigitalRoss's, but should have no danger of ever accidentally hitting your made-up string (!Q!). Since it tests with an if, you could also extend it to be careful about whether you actually have \\, at the end of your string, which should be an escaped slash, not comma.

BEGIN {
    FS = ","
}

{
    curfield=1
    for (i=1; i<=NF; i++) {
        if (substr($i,length($i)) == "\\") {
            fields[curfield] = fields[curfield] substr($i,1,length($i)-1) FS
        } else {
            fields[curfield] = fields[curfield] $i
            curfield++
        }
    }
    nf = curfield - 1
    for (i=1; i<=nf; i++) {
        printf("%d: %s   ",i,fields[i])
    }
    printf("\n")
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文