解析 CSV 文件,提取部分值,但不是全部
美好的一天,
我有一个本地 csv 文件,其中的值每天都会更改,名为 DailyValues.csv
我需要提取category2和category4的值字段。
然后对提取的值进行组合、排序并删除重复项(如果有)。
然后将其保存到新的本地文件 NewValues.txt。
以下是 DailyValues.csv 文件的示例:
category,date,value
category1,2010-05-18,value01
category1,2010-05-18,value02
category1,2010-05-18,value03
category1,2010-05-18,value04
category1,2010-05-18,value05
category1,2010-05-18,value06
category1,2010-05-18,value07
category2,2010-05-18,value08
category2,2010-05-18,value09
category2,2010-05-18,value10
category2,2010-05-18,value11
category2,2010-05-18,value12
category2,2010-05-18,value13
category2,2010-05-18,value14
category2,2010-05-18,value30
category3,2010-05-18,value16
category3,2010-05-18,value17
category3,2010-05-18,value18
category3,2010-05-18,value19
category3,2010-05-18,value20
category3,2010-05-18,value21
category3,2010-05-18,value22
category3,2010-05-18,value23
category3,2010-05-18,value24
category4,2010-05-18,value25
category4,2010-05-18,value26
category4,2010-05-18,value10
category4,2010-05-18,value28
category4,2010-05-18,value11
category4,2010-05-18,value30
category2,2010-05-18,value31
category2,2010-05-18,value32
category2,2010-05-18,value33
category2,2010-05-18,value34
category2,2010-05-18,value35
category2,2010-05-18,value07
我在 http://www.php.net/manual/en/function.fgetcsv.php 并设法提取值列的所有值,但不知道如何限制它只提取值类别 2/4 然后排序并清理重复项。
解决方案需要采用 php、perl 或 shell 脚本。
任何帮助将不胜感激。
先感谢您。
Good day,
I have a local csv file with values that change daily called DailyValues.csv
I need to extract the value field of category2 and category4.
Then combine, sort and remove duplicates (if any) from the extracted values.
Then save it to a new local file NewValues.txt.
Here is an example of the DailyValues.csv file:
category,date,value
category1,2010-05-18,value01
category1,2010-05-18,value02
category1,2010-05-18,value03
category1,2010-05-18,value04
category1,2010-05-18,value05
category1,2010-05-18,value06
category1,2010-05-18,value07
category2,2010-05-18,value08
category2,2010-05-18,value09
category2,2010-05-18,value10
category2,2010-05-18,value11
category2,2010-05-18,value12
category2,2010-05-18,value13
category2,2010-05-18,value14
category2,2010-05-18,value30
category3,2010-05-18,value16
category3,2010-05-18,value17
category3,2010-05-18,value18
category3,2010-05-18,value19
category3,2010-05-18,value20
category3,2010-05-18,value21
category3,2010-05-18,value22
category3,2010-05-18,value23
category3,2010-05-18,value24
category4,2010-05-18,value25
category4,2010-05-18,value26
category4,2010-05-18,value10
category4,2010-05-18,value28
category4,2010-05-18,value11
category4,2010-05-18,value30
category2,2010-05-18,value31
category2,2010-05-18,value32
category2,2010-05-18,value33
category2,2010-05-18,value34
category2,2010-05-18,value35
category2,2010-05-18,value07
I've found some helpful parsing examples at http://www.php.net/manual/en/function.fgetcsv.php and managed to extract all the values of the value column but don't know how to restrict it to only extract the values of category2/4 then sort and clean duplicate.
The solution needs to be in php, perl or shell script.
Any help would be much appreciated.
Thank you in advance.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一个 shell 脚本解决方案。
我使用
cut
命令只是为了向您展示您只能提取某些列,因为用于 cut 的f
开关会选择您要提取的列。用于排序的
u
开关使输出是唯一的。编辑:
重要的是您使用
egrep
而不是grep
,因为grep
使用某种程度受限的正则表达式集,并且 egrep 有一些进一步的功能编辑(例如只有 grep 可用的人):
它会产生相当大的开销,但仍然有效......
Here's a shell script solution.
I used the
cut
command just to show you that you can extract certain columns only, since thef
switch for cut chooses, which columns you want to extract.The
u
switch for sort makes the output to be unique.Edit:
It's important that you use
egrep
and notgrep
, sincegrep
uses a somewhat restricted regular expression set, and egrep has somewhat further facilitiesEdit (for people who only have grep available):
It produces quite an overhead but still works...