(sed/awk) 如何将字段分隔(如 csv)文件转换为具有固定大小的制表符分隔列的 txt?

发布于 2024-10-10 11:14:33 字数 834 浏览 3 评论 0原文

我有一个类似于 csv 文件的文件,其中字段分隔符是“@”。

ID@Name@Surname@Age@Profession@Address

1254343123@John@Smith@24@Engineer@Washington
23@Alexander@Kristofferson-Brown@Economic Advisor@Kent
...

我想将其转换为更易于人类阅读的内容,如下所示:

1254343123    John        Smith                  24    Engineer            Washington
23            Alexander   Kristofferson-Brown    35    Economic Advisor    Kent

...其中每列都以特定字符开头。

我尝试过利用 TAB 大小值并向该字段添加几个 TAB:

sed -e "{/@[^@]\{32,\}@/s/@\([^@]*\)@/\t\1\t/g};{/@[^@]\{24,31\}@/s/@\([^@]*\)@/\t\1\t/g};{/@[^@]\{16,23\}@/s/@\([^@]*\)@/\t\1\t\t/g};{/@[^@]\{8,15\}@/s/@\([^@]*\)@/\t\1\t\t/g};{/@[^@]\{2,7\}@/s/@\([^@]*\)@/\t\1\t\t\t/g}"

...这并不在所有情况下都有效。

有人可以给我一个如何继续的提示吗?

PS:我的目的是主要使用 sed (单行代码),或者如果 sed 不能解决问题,awk 也可以。

I have something like a csv file in which the field delimiter is "@".

ID@Name@Surname@Age@Profession@Address

1254343123@John@Smith@24@Engineer@Washington
23@Alexander@Kristofferson-Brown@Economic Advisor@Kent
...

I want to convert it into something more human-readable as in:

1254343123    John        Smith                  24    Engineer            Washington
23            Alexander   Kristofferson-Brown    35    Economic Advisor    Kent

...where each column stars at a specific character.

I have tried something that takes advantage of the TAB-size value and adds several TABs to the field:

sed -e "{/@[^@]\{32,\}@/s/@\([^@]*\)@/\t\1\t/g};{/@[^@]\{24,31\}@/s/@\([^@]*\)@/\t\1\t/g};{/@[^@]\{16,23\}@/s/@\([^@]*\)@/\t\1\t\t/g};{/@[^@]\{8,15\}@/s/@\([^@]*\)@/\t\1\t\t/g};{/@[^@]\{2,7\}@/s/@\([^@]*\)@/\t\1\t\t\t/g}"

...which does not work in all cases.

Could somebody give me a hint as how to proceed?

PS: My intention is to use mainly sed (a one-liner) or if sed just doesn't cut it, awk is OK.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

清旖 2024-10-17 11:14:33

BSD、Mac OS X 和 Linux 都有用于此目的的 column 命令:

column -t -s@

但它会生成空格,而不是制表符(而且应该如此,因为制表符已过时)。输出:

1254343123  John       Smith                24  Engineer          Washington
23          Alexander  Kristofferson-Brown  35  Economic Advisor  Kent

BSD, Mac OS X and Linux have the column command for this:

column -t -s@

It produces spaces though, not tabs (and it should, because tabs are obsolete). Output:

1254343123  John       Smith                24  Engineer          Washington
23          Alexander  Kristofferson-Brown  35  Economic Advisor  Kent
九厘米的零° 2024-10-17 11:14:33
awk -F@ '{for(i=1;i<=NF;i++){printf "%-20s", $i};printf "\n"}' input.csv

输入

$ cat input.csv
1254343123@John@Smith@24@Engineer@Washington
23@Alexander@Kristofferson-Brown@35@Economic Advisor@Kent

输出

$ awk -F@ '{for(i=1;i<=NF;i++){printf "%-20s", $i};printf "\n"}' input.csv
1254343123          John                Smith               24                  Engineer            Washington
23                  Alexander           Kristofferson-Brown 35                  Economic Advisor    Kent

如果你想让字段宽度(上面代码中的 20)成为一个可以传入的 shell 变量,你可以这样做:

#!/bin/bash

fldwth=20

awk -v fw=$fldwth -F@ '{for(i=1;i<=NF;i++){printf "%-*s", fw,$i};printf "\n"}' input.csv
awk -F@ '{for(i=1;i<=NF;i++){printf "%-20s", $i};printf "\n"}' input.csv

Input

$ cat input.csv
1254343123@John@Smith@24@Engineer@Washington
23@Alexander@Kristofferson-Brown@35@Economic Advisor@Kent

Output

$ awk -F@ '{for(i=1;i<=NF;i++){printf "%-20s", $i};printf "\n"}' input.csv
1254343123          John                Smith               24                  Engineer            Washington
23                  Alexander           Kristofferson-Brown 35                  Economic Advisor    Kent

If you want to make the field width (20 in the code above) a shell variable that can be passed in you do something like this:

#!/bin/bash

fldwth=20

awk -v fw=$fldwth -F@ '{for(i=1;i<=NF;i++){printf "%-*s", fw,$i};printf "\n"}' input.csv
梦幻之岛 2024-10-17 11:14:33

我的转换 Excel 默认 csv(逗号分隔,双引号括起来的文本)的解决方案是以下 awk 脚本:

#!/bin/nawk -f
# Q&D to transform csv (with commas imbedded in quotes) to pipe (|)
# Usage: cma2pipe.awk <in.csv> > <out.csv>
# Note: Assumes that <in.csv> contains no ~ or |  
{#MAIN
  s=$0;c=0;f=0;        #reset varibles for a line
  while (c<length(s)){ #loop thru line
    c++;               #char counter
    a=substr(s,c,1);   #get current character
    if (a=="\"")f++;   #flag quote
    if (f%2==1&&a==",")#if inside pair of quotes, look for ","
      s= repl("~",c,s);#replace commas with ~
  }#end while c
  gsub(",","|",s);     #replace remaining , with |
  gsub("~",",",s);     #put commas back
  gsub("\"","",s);     #get rid of quotes
print s
}#end MAIN
function repl(r,n,t){  #replace single character in string
  s1=substr(t,1,n-1);  #get first part of string
  s2=substr(t,n+1);    #get last part of string
  return(s1 r s2);     #return changed string
}#end repl()

My solution for converting the excel default csv (comma delimited, text enclosed in double quotes) is the following awk script:

#!/bin/nawk -f
# Q&D to transform csv (with commas imbedded in quotes) to pipe (|)
# Usage: cma2pipe.awk <in.csv> > <out.csv>
# Note: Assumes that <in.csv> contains no ~ or |  
{#MAIN
  s=$0;c=0;f=0;        #reset varibles for a line
  while (c<length(s)){ #loop thru line
    c++;               #char counter
    a=substr(s,c,1);   #get current character
    if (a=="\"")f++;   #flag quote
    if (f%2==1&&a==",")#if inside pair of quotes, look for ","
      s= repl("~",c,s);#replace commas with ~
  }#end while c
  gsub(",","|",s);     #replace remaining , with |
  gsub("~",",",s);     #put commas back
  gsub("\"","",s);     #get rid of quotes
print s
}#end MAIN
function repl(r,n,t){  #replace single character in string
  s1=substr(t,1,n-1);  #get first part of string
  s2=substr(t,n+1);    #get last part of string
  return(s1 r s2);     #return changed string
}#end repl()
桃气十足 2024-10-17 11:14:33

awk -F@ '{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6}' file.csv > read.txt

尝试使用 awk 的分隔符选项并通过在每列之间放置 \t 来打印列。

awk -F@ '{print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6}' file.csv > readable.txt

Try this one using delimiter option for awk and printing columns by putting \t between each.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文