使用 awk 解析字段中包含命令的 csv 文件
我必须使用 awk 打印 csv 文件中的 4 个不同列。问题是字符串采用 $x,xxx.xx 格式。当我运行常规 awk 命令时。
awk -F, {print $1} testfile.csv
我的输出“最终看起来像
307.00
$132.34
30.23
我做错了什么”。
“$141,818.88”,“$52,831,578.53”,“$52,788,069.53”
这就是大致的输入。我必须解析的文件有 90,000 行和大约 40 列 这就是输入的布局方式,或者至少是我必须处理的部分输入的布局方式。很抱歉,如果我让您认为这不是我所说的。
如果输入为“$307.00”、“$132.34”、“$30.23” 我希望输出位于
$307.00
$132.34
$30.23
i have to use awk to print out 4 different columns in a csv file. The problem is the strings are in a $x,xxx.xx format. When I run the regular awk command.
awk -F, {print $1} testfile.csv
my output `ends up looking like
307.00
$132.34
30.23
What am I doing wrong.
"$141,818.88","$52,831,578.53","$52,788,069.53"
this is roughly the input. The file I have to parse is 90,000 rows and about 40 columns
This is how the input is laid out or at least the parts of it that I have to deal with. Sorry if I made you think this wasn't what I was talking about.
If the input is "$307.00","$132.34","$30.23"
I want the output to be in a
$307.00
$132.34
$30.23
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
奇怪的是,我前段时间不得不解决这个问题,并且我保留了代码来完成它。您几乎已经完成了,但是您需要对字段分隔符进行一些技巧处理。
您会注意到,由于字段分隔符
'{print $2}' testfile.csv^"
,“第一个”字段实际上是$2
。如果您要求,为短的 1-liner 付出的代价很小我。输入
输出
您会注意到,由于字段分隔符
^"
,“第一个”字段实际上是$2
。如果您要求,为短的 1-liner 付出的代价很小我。Oddly enough I had to tackle this problem some time ago and I kept the code around to do it. You almost had it, but you need to get a bit tricky with your field separator(s).
You'll note that the "first" field is actually
'{print $2}' testfile.csv$2
because of the field separator^"
. Small price to pay for a short 1-liner if you ask me.Input
Output
You'll note that the "first" field is actually
$2
because of the field separator^"
. Small price to pay for a short 1-liner if you ask me.我认为您的意思是您希望将输入拆分为 CSV 字段,同时又不会被双引号内的逗号绊倒。如果是这样...
首先,使用
","
作为字段分隔符,如下所示:但是,您仍然会在 $1 的开头(以及末尾)得到一个杂散的双引号最后一个字段)。通过使用 gsub 去掉引号来处理这个问题,如下所示:
结果:
I think what you're saying is that you want to split the input into CSV fields while not getting tripped up by the commas inside the double quotes. If so...
First, use
","
as the field separator, like this:But then you'll still end up with a stray double-quote at the beginning of $1 (and at the end of the last field). Handle that by stripping quotes out with gsub, like this:
Result:
为了让 awk 处理包含字段分隔符的引用字段,您可以使用我编写的一个名为 csvquote 的小脚本。它会暂时用非打印字符替换有问题的逗号,然后在管道末尾恢复它们。像这样:
这也适用于任何其他 UNIX 文本处理程序,例如 cut:
您可以在此处获取 csvquote 代码:https: //github.com/dbro/csvquote
In order to let awk handle quoted fields that contain the field separator, you can use a small script I wrote called csvquote. It temporarily replaces the offending commas with nonprinting characters, and then you restore them at the end of your pipeline. Like this:
This would also work with any other UNIX text processing program like cut:
You can get the csvquote code here: https://github.com/dbro/csvquote
数据文件:
AWK脚本:
执行:
The data file:
The AWK script:
The execution: