解析 CSV 文件，提取部分值，但不是全部

发布于 2024-09-02 05:27:28 字数 1636 浏览 6 评论 0原文

美好的一天，

我有一个本地 csv 文件，其中的值每天都会更改，名为 DailyValues.csv
我需要提取category2和category4的值字段。
然后对提取的值进行组合、排序并删除重复项（如果有）。
然后将其保存到新的本地文件 NewValues.txt。

以下是 DailyValues.csv 文件的示例：

category,date,value  
category1,2010-05-18,value01  
category1,2010-05-18,value02  
category1,2010-05-18,value03  
category1,2010-05-18,value04  
category1,2010-05-18,value05  
category1,2010-05-18,value06  
category1,2010-05-18,value07  
category2,2010-05-18,value08  
category2,2010-05-18,value09  
category2,2010-05-18,value10  
category2,2010-05-18,value11  
category2,2010-05-18,value12  
category2,2010-05-18,value13  
category2,2010-05-18,value14  
category2,2010-05-18,value30  
category3,2010-05-18,value16  
category3,2010-05-18,value17  
category3,2010-05-18,value18  
category3,2010-05-18,value19  
category3,2010-05-18,value20  
category3,2010-05-18,value21  
category3,2010-05-18,value22  
category3,2010-05-18,value23  
category3,2010-05-18,value24  
category4,2010-05-18,value25  
category4,2010-05-18,value26  
category4,2010-05-18,value10  
category4,2010-05-18,value28  
category4,2010-05-18,value11  
category4,2010-05-18,value30  
category2,2010-05-18,value31  
category2,2010-05-18,value32  
category2,2010-05-18,value33  
category2,2010-05-18,value34  
category2,2010-05-18,value35  
category2,2010-05-18,value07

我在 http://www.php.net/manual/en/function.fgetcsv.php 并设法提取值列的所有值，但不知道如何限制它只提取值类别 2/4 然后排序并清理重复项。

解决方案需要采用 php、perl 或 shell 脚本。

任何帮助将不胜感激。
先感谢您。

原文

Good day,

I have a local csv file with values that change daily called DailyValues.csv
I need to extract the value field of category2 and category4.
Then combine, sort and remove duplicates (if any) from the extracted values.
Then save it to a new local file NewValues.txt.

Here is an example of the DailyValues.csv file:

category,date,value  
category1,2010-05-18,value01  
category1,2010-05-18,value02  
category1,2010-05-18,value03  
category1,2010-05-18,value04  
category1,2010-05-18,value05  
category1,2010-05-18,value06  
category1,2010-05-18,value07  
category2,2010-05-18,value08  
category2,2010-05-18,value09  
category2,2010-05-18,value10  
category2,2010-05-18,value11  
category2,2010-05-18,value12  
category2,2010-05-18,value13  
category2,2010-05-18,value14  
category2,2010-05-18,value30  
category3,2010-05-18,value16  
category3,2010-05-18,value17  
category3,2010-05-18,value18  
category3,2010-05-18,value19  
category3,2010-05-18,value20  
category3,2010-05-18,value21  
category3,2010-05-18,value22  
category3,2010-05-18,value23  
category3,2010-05-18,value24  
category4,2010-05-18,value25  
category4,2010-05-18,value26  
category4,2010-05-18,value10  
category4,2010-05-18,value28  
category4,2010-05-18,value11  
category4,2010-05-18,value30  
category2,2010-05-18,value31  
category2,2010-05-18,value32  
category2,2010-05-18,value33  
category2,2010-05-18,value34  
category2,2010-05-18,value35  
category2,2010-05-18,value07

I've found some helpful parsing examples at http://www.php.net/manual/en/function.fgetcsv.php and managed to extract all the values of the value column but don't know how to restrict it to only extract the values of category2/4 then sort and clean duplicate.

The solution needs to be in php, perl or shell script.

Any help would be much appreciated.
Thank you in advance.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦魇绽荼蘼 2024-09-09 05:27:28

这是一个 shell 脚本解决方案。

egrep 'category4|category2' input.file | cut -d"," -f1,3 | sort -u > output.file

我使用 cut 命令只是为了向您展示您只能提取某些列，因为用于 cut 的 f 开关会选择您要提取的列。

用于排序的 u 开关使输出是唯一的。

编辑：
重要的是您使用 egrep 而不是 grep，因为 grep 使用某种程度受限的正则表达式集，并且 egrep 有一些进一步的功能

编辑（例如只有 grep 可用的人）：

grep 'category2' input.file > temp.file && grep 'category4' input.file >> temp.file && cut temp.file -d"," -f1,3 | sort -u > output.file && rm temp.file

它会产生相当大的开销，但仍然有效......

Here's a shell script solution.

egrep 'category4|category2' input.file | cut -d"," -f1,3 | sort -u > output.file

I used the cut command just to show you that you can extract certain columns only, since the f switch for cut chooses, which columns you want to extract.

The u switch for sort makes the output to be unique.

Edit:
It's important that you use egrep and not grep, since grep uses a somewhat restricted regular expression set, and egrep has somewhat further facilities

Edit (for people who only have grep available):

grep 'category2' input.file > temp.file && grep 'category4' input.file >> temp.file && cut temp.file -d"," -f1,3 | sort -u > output.file && rm temp.file

It produces quite an overhead but still works...

回复收藏 0 原文

~没有更多了~