使用 awk 解析字段中包含命令的 csv 文件

发布于 2024-10-06 14:35:12 字数 474 浏览 3 评论 0原文

我必须使用 awk 打印 csv 文件中的 4 个不同列。问题是字符串采用 $x,xxx.xx 格式。当我运行常规 awk 命令时。

awk -F, {print $1} testfile.csv 

我的输出“最终看起来像

307.00
$132.34
30.23

我做错了什么”。

“$141,818.88”,“$52,831,578.53”,“$52,788,069.53” 这就是大致的输入。我必须解析的文件有 90,000 行和大约 40 列 这就是输入的布局方式,或者至少是我必须处理的部分输入的布局方式。很抱歉,如果我让您认为这不是我所说的。

如果输入为“$307.00”、“$132.34”、“$30.23” 我希望输出位于

$307.00
$132.34
$30.23

i have to use awk to print out 4 different columns in a csv file. The problem is the strings are in a $x,xxx.xx format. When I run the regular awk command.

awk -F, {print $1} testfile.csv 

my output `ends up looking like

307.00
$132.34
30.23

What am I doing wrong.

"$141,818.88","$52,831,578.53","$52,788,069.53"
this is roughly the input. The file I have to parse is 90,000 rows and about 40 columns
This is how the input is laid out or at least the parts of it that I have to deal with. Sorry if I made you think this wasn't what I was talking about.

If the input is "$307.00","$132.34","$30.23"
I want the output to be in a

$307.00
$132.34
$30.23

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

遗忘曾经 2024-10-13 14:35:12

奇怪的是,我前段时间不得不解决这个问题,并且我保留了代码来完成它。您几乎已经完成了,但是您需要对字段分隔符进行一些技巧处理。

awk -F'","|^"|"

输入

# cat testfile.csv
"$141,818.88","$52,831,578.53","$52,788,069.53"
"$2,558.20","$482,619.11","$9,687,142.69"
"$786.48","$8,568,159.41","$159,180,818.00"

输出

# awk -F'","|^"|"

您会注意到,由于字段分隔符 ^",“第一个”字段实际上是 $2。如果您要求,为短的 1-liner 付出的代价很小我。

'{print $2}' testfile.csv

输入


输出


您会注意到,由于字段分隔符 ^",“第一个”字段实际上是 $2。如果您要求,为短的 1-liner 付出的代价很小我。

'{print $2}' testfile.csv $141,818.88 $2,558.20 $786.48

您会注意到,由于字段分隔符 ^",“第一个”字段实际上是 $2。如果您要求,为短的 1-liner 付出的代价很小我。

'{print $2}' testfile.csv

输入

输出

您会注意到,由于字段分隔符 ^",“第一个”字段实际上是 $2。如果您要求,为短的 1-liner 付出的代价很小我。

Oddly enough I had to tackle this problem some time ago and I kept the code around to do it. You almost had it, but you need to get a bit tricky with your field separator(s).

awk -F'","|^"|"

Input

# cat testfile.csv
"$141,818.88","$52,831,578.53","$52,788,069.53"
"$2,558.20","$482,619.11","$9,687,142.69"
"$786.48","$8,568,159.41","$159,180,818.00"

Output

# awk -F'","|^"|"

You'll note that the "first" field is actually $2 because of the field separator ^". Small price to pay for a short 1-liner if you ask me.

'{print $2}' testfile.csv

Input


Output


You'll note that the "first" field is actually $2 because of the field separator ^". Small price to pay for a short 1-liner if you ask me.

'{print $2}' testfile.csv $141,818.88 $2,558.20 $786.48

You'll note that the "first" field is actually $2 because of the field separator ^". Small price to pay for a short 1-liner if you ask me.

'{print $2}' testfile.csv

Input

Output

You'll note that the "first" field is actually $2 because of the field separator ^". Small price to pay for a short 1-liner if you ask me.

凌乱心跳 2024-10-13 14:35:12

我认为您的意思是您希望将输入拆分为 CSV 字段,同时又不会被双引号内的逗号绊倒。如果是这样...

首先,使用 "," 作为字段分隔符,如下所示:

awk -F'","' '{print $1}'

但是,您仍然会在 $1 的开头(以及末尾)得到一个杂散的双引号最后一个字段)。通过使用 gsub 去掉引号来处理这个问题,如下所示:

awk -F'","' '{x=$1; gsub("\"","",x); print x}'

结果:

echo '"abc,def","ghi,xyz"' | awk -F'","' '{x=$1; gsub("\"","",x); print x}'

abc,def

I think what you're saying is that you want to split the input into CSV fields while not getting tripped up by the commas inside the double quotes. If so...

First, use "," as the field separator, like this:

awk -F'","' '{print $1}'

But then you'll still end up with a stray double-quote at the beginning of $1 (and at the end of the last field). Handle that by stripping quotes out with gsub, like this:

awk -F'","' '{x=$1; gsub("\"","",x); print x}'

Result:

echo '"abc,def","ghi,xyz"' | awk -F'","' '{x=$1; gsub("\"","",x); print x}'

abc,def
把梦留给海 2024-10-13 14:35:12

为了让 awk 处理包含字段分隔符的引用字段,您可以使用我编写的一个名为 csvquote 的小脚本。它会暂时用非打印字符替换有问题的逗号,然后在管道末尾恢复它们。像这样:

csvquote testfile.csv | awk -F, {print $1} | csvquote -u

这也适用于任何其他 UNIX 文本处理程序,例如 cut:

csvquote testfile.csv | cut -d, -f1 | csvquote -u

您可以在此处获取 csvquote 代码:https: //github.com/dbro/csvquote

In order to let awk handle quoted fields that contain the field separator, you can use a small script I wrote called csvquote. It temporarily replaces the offending commas with nonprinting characters, and then you restore them at the end of your pipeline. Like this:

csvquote testfile.csv | awk -F, {print $1} | csvquote -u

This would also work with any other UNIX text processing program like cut:

csvquote testfile.csv | cut -d, -f1 | csvquote -u

You can get the csvquote code here: https://github.com/dbro/csvquote

寄居人 2024-10-13 14:35:12

数据文件:

$ cat data.txt
"$307.00","$132.34","$30.23"

AWK脚本:

$ cat csv.awk
BEGIN { RS = "," }
{ gsub("\"", "", $1);
  print $1 }

执行:

$ awk -f csv.awk data.txt
$307.00
$132.34
$30.23

The data file:

$ cat data.txt
"$307.00","$132.34","$30.23"

The AWK script:

$ cat csv.awk
BEGIN { RS = "," }
{ gsub("\"", "", $1);
  print $1 }

The execution:

$ awk -f csv.awk data.txt
$307.00
$132.34
$30.23
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文