解析逗号分隔的 csv 文件的问题

发布于 2024-08-25 19:11:34 字数 268 浏览 5 评论 0原文

我正在尝试使用此命令从 csv 文件中提取第四列（以逗号分隔，并跳过前 2 个标题行），

 awk 'NR <2 {next}{FS =","}{print $4}' filename.csv | more

但是，它不起作用，因为第一列可以包含逗号，因此第四列并不是真正的第四列。以下是行的示例：

“sdfsdfsd，sfsdf”，454，fgdfg，I_want_this_column，sdfgdg，34546，456465等

原文

I am trying to extract 4th column from csv file (comma separated, and skipping first 2 header lines) using this command,

 awk 'NR <2 {next}{FS =","}{print $4}' filename.csv | more

However, it doesn't work because the first column cantains comma, thus 4th column is not really 4th. Below is an example of a row:

"sdfsdfsd, sfsdf", 454,fgdfg, I_want_this_column,sdfgdg,34546, 456465, etc

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

巴黎夜雨 2024-09-01 19:11:34

除非您有使用 awk 的特定原因，否则我建议使用 CSV 解析库。许多脚本语言都有一个内置的（或至少可用的），它们可以帮助您摆脱这些令人头疼的问题。

回复收藏 0 原文

浊酒尽余欢 2024-09-01 19:11:34

如果你的第一列总是有引号，

 $ awk 'BEGIN{ FS="\042[ ]*," } { m=split($2,a,","); print a[3] } ' file
 I_want_this_column

如果你想要的列总是最后第二列，

$ awk -F"," '{print $(NF-1)}' file
 I_want_this_column

你可以尝试这个演示脚本来分解列

awk 'BEGIN{ FS="," }
{
   for(i=1;i<=NF;i++){
      # save normal
      if($i !~ /^[ ]*\042|[ ]*\042[ ]*$/){
        a[++j]=$i
      }
      # if quotes at the end
      if(f==1 && $i ~ /[ ]*\042[ ]*$/){
        s=s","$i
        a[++j]=s
        #reset
        s="";f=0
      }
      # if quotes in front
      if($i ~ /^[ ]*\042/){
        s=s $i
        f=1
      }
      if(f==1 && ( $i !~/\042/ ) ){
         s=s","$i
      }
   }
}
END{
  # print columns
  for(p=1;p<=j;p++){
     print "Field "p,": "a[p]
  }
} ' file

输出

$ cat file
"sdfsdfsd, sfsdf", "454,fgdfg blah , words ", I_want_this_column,sdfgdg

$ ./shell.sh
Field 1 : "sdfsdfsd, sfsdf"
Field 2 : fgdfg blah
Field 3 :  "454,fgdfg blah , words "
Field 4 :  I_want_this_column
Field 5 : sdfgdg

if your first column has quotes always,

 $ awk 'BEGIN{ FS="\042[ ]*," } { m=split($2,a,","); print a[3] } ' file
 I_want_this_column

if the column you want is always the last 2nd,

$ awk -F"," '{print $(NF-1)}' file
 I_want_this_column

You can try this demo script to break down the columns

awk 'BEGIN{ FS="," }
{
   for(i=1;i<=NF;i++){
      # save normal
      if($i !~ /^[ ]*\042|[ ]*\042[ ]*$/){
        a[++j]=$i
      }
      # if quotes at the end
      if(f==1 && $i ~ /[ ]*\042[ ]*$/){
        s=s","$i
        a[++j]=s
        #reset
        s="";f=0
      }
      # if quotes in front
      if($i ~ /^[ ]*\042/){
        s=s $i
        f=1
      }
      if(f==1 && ( $i !~/\042/ ) ){
         s=s","$i
      }
   }
}
END{
  # print columns
  for(p=1;p<=j;p++){
     print "Field "p,": "a[p]
  }
} ' file

output

$ cat file
"sdfsdfsd, sfsdf", "454,fgdfg blah , words ", I_want_this_column,sdfgdg

$ ./shell.sh
Field 1 : "sdfsdfsd, sfsdf"
Field 2 : fgdfg blah
Field 3 :  "454,fgdfg blah , words "
Field 4 :  I_want_this_column
Field 5 : sdfgdg

回复收藏 0 原文

撕心裂肺的伤痛 2024-09-01 19:11:34

你不应该在这里使用 awk。使用 Python csv 模块或 Perl Text::CSV 或 Text::CSV_XS 模块或其他模块真正的 csv 解析器。

相关问题-
使用 gawk 解析 csv 文件

回复收藏 0 原文

独享拥抱 2024-09-01 19:11:34

如果您无法避免 awk，这段代码可以完成您需要的工作：

BEGIN {FS=",";}

{
        f=0;
        j=0;
        for (i = 1; i <=NF ; ++i) {
                if (f) {
                        a[j] = a[j] "," $(i);
                        if ($(i) ~ "\"$") {
                                f = 0;
                        }
                }
                else {
                        ++j;
                        a[j] = $(i);
                        if ((a[j] ~ "^\"[^\"]*$")) {
                                f = 1;
                        }
                }
        }
        for (i = 1; i <= j; ++i) {
                gsub("^\"","",a[i]);
                gsub("\"$","",a[i]);
                gsub("\"\"","\"",a[i]);
print "i = \"" a[i] "\"";
        }
}

If you can't avoid awk, this piece of code does the job you need:

BEGIN {FS=",";}

{
        f=0;
        j=0;
        for (i = 1; i <=NF ; ++i) {
                if (f) {
                        a[j] = a[j] "," $(i);
                        if ($(i) ~ "\"$") {
                                f = 0;
                        }
                }
                else {
                        ++j;
                        a[j] = $(i);
                        if ((a[j] ~ "^\"[^\"]*$")) {
                                f = 1;
                        }
                }
        }
        for (i = 1; i <= j; ++i) {
                gsub("^\"","",a[i]);
                gsub("\"$","",a[i]);
                gsub("\"\"","\"",a[i]);
print "i = \"" a[i] "\"";
        }
}

回复收藏 0 原文

讽刺将军 2024-09-01 19:11:34

使用标准 UNIX 文本工具处理包含带逗号的引号字段的 CSV 文件可能会很困难。

我编写了一个名为 csvquote 的程序，使他们能够轻松处理数据。在您的情况下，您可以像这样使用它：

csvquote filename.csv | awk 'NR <2 {next}{FS =","}{print $4}' | csvquote -u | more

或者您可以像这样使用剪切和尾部：

csvquote filename.csv | tail -n +3 | cut -d, -f4 | csvquote -u | more

代码和文档在这里： https://github.com/dbro/csvquote

Working with CSV files that have quoted fields with commas inside can be difficult with the standard UNIX text tools.

I wrote a program called csvquote to make the data easy for them to handle. In your case, you could use it like this:

csvquote filename.csv | awk 'NR <2 {next}{FS =","}{print $4}' | csvquote -u | more

or you could use cut and tail like this:

csvquote filename.csv | tail -n +3 | cut -d, -f4 | csvquote -u | more

The code and docs are here: https://github.com/dbro/csvquote

回复收藏 0 原文

~没有更多了~

关于作者

九八野马

暂无简介

0 文章

0 评论

23 人气

关注发私信

玍銹的英雄夢

文章 0 评论 0

关注

我不会写诗

文章 0 评论 0

关注

十六岁半

文章 0 评论 0

关注

浸婚纱

文章 0 评论 0

关注

qq_kJ6XkX

文章 0 评论 0

关注

旧伤还要旧人安

文章 0 评论 0

友情链接

文江博客

解析逗号分隔的 csv 文件的问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

解析逗号分隔的 csv 文件的问题

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（5）

关于作者

相关话题

热门标签

推荐作者

玍銹的英雄夢

我不会写诗

十六岁半

浸婚纱

qq_kJ6XkX

旧伤还要旧人安

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。