如何在未形式的数据集中添加带有相邻内容的标题文本,并与定界符分离的sed/awk/python并排分离值
我有一长串未格式化的数据,例如 data.txt,其中每个集合都以标题开头并以空行结尾,例如:
TypeA/Price:20$
alexmob
moblexto
unkntom
TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret
TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox
现在,我想添加每个集合的标题及其内容并排并用逗号分隔。就像:
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$
moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$
rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$
这样每当我使用一个关键字进行 grep 时,相关内容就会与标题一起出现。喜欢:
$grep mob data.txt
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
moblexto2,TypeB/Price:25$
alexmob3,TypeB/Price:25$
mobryhox,TypeC/Price:30$
我是 bash 脚本和 python 的新手,最近开始学习这些,所以非常感谢任何简单的 bash 脚本(使用 sed/awk)或 python 脚本。
I have a long list of unformatted data say data.txt where each set is started with a header and ends with a blank line, like:
TypeA/Price:20$
alexmob
moblexto
unkntom
TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret
TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox
Now, i want to add the header of each set with it's content side by side with comma separated. Like:
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$
moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$
rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$
so that whenever i will grep with one keyword, relevant content along with the header comes together. Like:
$grep mob data.txt
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
moblexto2,TypeB/Price:25$
alexmob3,TypeB/Price:25$
mobryhox,TypeC/Price:30$
I am newbie on bash scripting as well as python and recently started learning these, so would really appreciate any simple bash scipting (using sed/awk) or python scripting.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用
sed
匹配包含
type
的行,将其保持在内存中并删除。与字母字符匹配行,附加
g
保留空间的内容。最后,逗号的新线。Using
sed
Match lines containing
Type
, hold it in memory and delete it.Match lines with alphabetic characters, append
G
the contents of the hold space. Finally, sub new line for a comma.我将使用 GNU
AWK
来完成此任务,如下所示,让file.txt
内容然后
输出
说明:如果行以 (
^
) < 开头code>Type 将header
值设置为该行 ($0
) 并转到下一个
行。对于每一行print
如果它确实包含至少一个字符 (/./
) 行 ($0
) 并与; 和
header
,否则按原样打印行 ($0
)。(在 GNU Awk 5.0.1 中测试)
I would use GNU
AWK
for this task following way, letfile.txt
content bethen
output
Explanation: If line starts with (
^
)Type
setheader
value to that line ($0
) and go tonext
line. For every lineprint
if it does contain at least one character (/./
) line ($0
) concatenated with;
andheader
, otherwise print line ($0
) as is.(tested in GNU Awk 5.0.1)
无论您的数据中哪个字符,每个Unix框中的任何壳中使用任何尴尬:
Using any awk in any shell on every Unix box regardless of which characters are in your data: