解析逗号分隔的 csv 文件的问题
我正在尝试使用此命令从 csv 文件中提取第四列(以逗号分隔,并跳过前 2 个标题行),
awk 'NR <2 {next}{FS =","}{print $4}' filename.csv | more
但是,它不起作用,因为第一列可以包含逗号,因此第四列并不是真正的第四列。以下是行的示例:
“sdfsdfsd,sfsdf”,454,fgdfg,I_want_this_column,sdfgdg,34546,456465等
I am trying to extract 4th column from csv file (comma separated, and skipping first 2 header lines) using this command,
awk 'NR <2 {next}{FS =","}{print $4}' filename.csv | more
However, it doesn't work because the first column cantains comma, thus 4th column is not really 4th. Below is an example of a row:
"sdfsdfsd, sfsdf", 454,fgdfg, I_want_this_column,sdfgdg,34546, 456465, etc
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
除非您有使用 awk 的特定原因,否则我建议使用 CSV 解析库。许多脚本语言都有一个内置的(或至少可用的),它们可以帮助您摆脱这些令人头疼的问题。
Unless you have specific reasons for using
awk
, I would recommend using a CSV parsing library. Many scripting languages have one built-in (or at least available) and they'll save you from these headaches.如果你的第一列总是有引号,
如果你想要的列总是最后第二列,
你可以尝试这个演示脚本来分解列
输出
if your first column has quotes always,
if the column you want is always the last 2nd,
You can try this demo script to break down the columns
output
你不应该在这里使用 awk。使用 Python csv 模块或 Perl Text::CSV 或 Text::CSV_XS 模块或其他模块真正的 csv 解析器。
相关问题-
使用 gawk 解析 csv 文件
You shouldn't use awk here. Use Python csv module or Perl Text::CSV or Text::CSV_XS modules or another real csv parser.
Related question -
parse csv file using gawk
如果您无法避免 awk,这段代码可以完成您需要的工作:
If you can't avoid awk, this piece of code does the job you need:
使用标准 UNIX 文本工具处理包含带逗号的引号字段的 CSV 文件可能会很困难。
我编写了一个名为 csvquote 的程序,使他们能够轻松处理数据。在您的情况下,您可以像这样使用它:
或者您可以像这样使用剪切和尾部:
代码和文档在这里: https://github.com/dbro/csvquote
Working with CSV files that have quoted fields with commas inside can be difficult with the standard UNIX text tools.
I wrote a program called csvquote to make the data easy for them to handle. In your case, you could use it like this:
or you could use cut and tail like this:
The code and docs are here: https://github.com/dbro/csvquote