linux,逗号分隔的单元格到行保留/聚合列

发布于 2024-11-08 22:09:32 字数 631 浏览 0 评论 0原文

这里有一个类似的问题,但对于 excel/vba Excel宏 - 逗号分隔的单元格到行保留/聚合列 因为我有一个大文件(> 300mb),这不是一个选项,因此我正在努力让它在 bash 中工作。

根据这些数据

 1   Cat1                 a,b,c
 2   Cat2                 d
 3   Cat3                 e
 4   Cat4                 f,g

我想将其转换为:

 1   Cat1                 a
 1   Cat1                 b
 1   Cat1                 c
 2   Cat2                 d
 3   Cat3                 e
 4   Cat4                 f
 4   Cat4                 g

There was a similar question here but for excel/vba Excel Macro - Comma Separated Cells to Rows Preserve/Aggregate Column
because i have a big file (>300mb) this is not an option, thus I am struggeling to get it to work in bash.

Based on this data

 1   Cat1                 a,b,c
 2   Cat2                 d
 3   Cat3                 e
 4   Cat4                 f,g

I would like to convert it to:

 1   Cat1                 a
 1   Cat1                 b
 1   Cat1                 c
 2   Cat2                 d
 3   Cat3                 e
 4   Cat4                 f
 4   Cat4                 g

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清君侧 2024-11-15 22:09:32
cat > data << EOF
1   Cat1                 a,b,c
2   Cat2                 d
3   Cat3                 e
4   Cat4                 f,g
EOF

set -f                               # turn off globbing
IFS=,                                # prepare for comma-separated data
while IFS=
\t' read C1 C2 C3; do    # split columns at tabs
    for X in $C3; do                 # split C3 at commas (due to IFS)
        printf '%s\t%s\t%s\n' "$C1" "$C2" "$X"
    done
done < data
cat > data << EOF
1   Cat1                 a,b,c
2   Cat2                 d
3   Cat3                 e
4   Cat4                 f,g
EOF

set -f                               # turn off globbing
IFS=,                                # prepare for comma-separated data
while IFS=
\t' read C1 C2 C3; do    # split columns at tabs
    for X in $C3; do                 # split C3 at commas (due to IFS)
        printf '%s\t%s\t%s\n' "$C1" "$C2" "$X"
    done
done < data
池木 2024-11-15 22:09:32

这看起来像是 awk 或 perl 的工作。

awk 'BEGIN { FS = OFS = "\t" }
     { split($3, a, ",");
       for (i in a) {$3 = a[i]; print} }'
perl -F'\t' -alne 'foreach (split ",", $F[2]) {
                       $F[2] = $_; print join("\t", @F)
                   }'

两个程序都基于相同的算法:以逗号分割第三列,并迭代各个组件,依次打印原始行以及第三列中的每个组件。

This looks like a job for awk or perl.

awk 'BEGIN { FS = OFS = "\t" }
     { split($3, a, ",");
       for (i in a) {$3 = a[i]; print} }'
perl -F'\t' -alne 'foreach (split ",", $F[2]) {
                       $F[2] = $_; print join("\t", @F)
                   }'

Both programs are based on the same algorithm: split the third column at commas, and iterate over the components, printing the original line with each component in the third column in turn.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文