从行中提取可选字段值
我的文本采用单独行的形式,其中每行都具有类似 CSV 的格式:
SOME BUNCH OF TEXT, FIELD_A: 12, FIELD_B: 0.2321, FIELD_C: 12:10:08 2011/07/22, FIELD_D: 656
字段的顺序始终相同,但某些字段可能不存在。感兴趣的字段之间可以有其他字段,例如与上面的行相比,我也可以得到以下内容:
SOME BUNCH OF TEXT, FIELD_A: 12, NOT_INTERESTED: 235, FIELD_B: 0.2321, FIELD_C: 12:10:08 2011/07/22, FIELD_D: 656, FIELDS
作为处理此文本的结果,我希望拥有一个又一个指定字段的干净的 CSV 文件:
12,0.2321,12:10:08 2011/07/22,656
如果某个字段不存在,那么我想简单地省略值(例如,FIELD_B 不存在):
12,,12:10:08 2011/07/22,656
如何使用 sed、perl 或 awk 等命令来执行此操作? 我尝试使用 perl -pe 's/^.*?(FIELD_A: (.*?),)?.*?$/\2/'
提取单个字段,但失败 - 正则表达式只是忽略我的场,即使它呈现
I have text in the form of separate lines, where each line has CSV-like format:
SOME BUNCH OF TEXT, FIELD_A: 12, FIELD_B: 0.2321, FIELD_C: 12:10:08 2011/07/22, FIELD_D: 656
The order of fields is always the same, but some fields may be absent. There can be other fields between fields of interest, for example comparing to the line above I can get the following as well:
SOME BUNCH OF TEXT, FIELD_A: 12, NOT_INTERESTED: 235, FIELD_B: 0.2321, FIELD_C: 12:10:08 2011/07/22, FIELD_D: 656, FIELDS
As the result of processing this text I want to have clean CSV file with my fields specified one after another:
12,0.2321,12:10:08 2011/07/22,656
If some field is absent then I would like to simple omit value (for example FIELD_B was absent):
12,,12:10:08 2011/07/22,656
How can I do this using commands like sed, perl or awk ?
I tried extracting single field with perl -pe 's/^.*?(FIELD_A: (.*?),)?.*?$/\2/'
and failed - regex simply ignores my field even if it presents
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以将
awk
与关联数组一起使用,如下所示。循环字段并在:
上分割它们。然后将键值对存储到关联数组中。最后打印出你想要的字段。You can use
awk
with an associative array as shown below. Loop over the fields and split them on:
. Then store the key-value pair into an associative array. Finally print out the fields you want.这样怎么样(假设文件名已知):
输出:
How about this way (assuming fileds names are known) :
output: