如何在未形式的数据集中添加带有相邻内容的标题文本,并与定界符分离的sed/awk/python并排分离值

发布于 2025-01-19 15:10:59 字数 865 浏览 1 评论 0原文

我有一长串未格式化的数据,例如 data.txt,其中每个集合都以标题开头并以空行结尾,例如:

TypeA/Price:20$
alexmob
moblexto
unkntom

TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret

TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox

现在,我想添加每个集合的标题及其内容并排并用逗号分隔。就像:

alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

这样每当我使用一个关键字进行 grep 时,相关内容就会与标题一起出现。喜欢:

$grep mob data.txt
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
moblexto2,TypeB/Price:25$
alexmob3,TypeB/Price:25$
mobryhox,TypeC/Price:30$

我是 bash 脚本和 python 的新手,最近开始学习这些,所以非常感谢任何简单的 bash 脚本(使用 sed/awk)或 python 脚本。

I have a long list of unformatted data say data.txt where each set is started with a header and ends with a blank line, like:

TypeA/Price:20$
alexmob
moblexto
unkntom

TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret

TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox

Now, i want to add the header of each set with it's content side by side with comma separated. Like:

alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

so that whenever i will grep with one keyword, relevant content along with the header comes together. Like:

$grep mob data.txt
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
moblexto2,TypeB/Price:25$
alexmob3,TypeB/Price:25$
mobryhox,TypeC/Price:30$

I am newbie on bash scripting as well as python and recently started learning these, so would really appreciate any simple bash scipting (using sed/awk) or python scripting.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

知足的幸福 2025-01-26 15:10:59

使用sed

$ sed '/Type/{h;d;};/[a-z]/{G;s/\n/,/}' input_file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

匹配包含type的行,将其保持在内存中并删除。

与字母字符匹配行,附加g保留空间的内容。最后,逗号的新线。

Using sed

$ sed '/Type/{h;d;};/[a-z]/{G;s/\n/,/}' input_file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

Match lines containing Type, hold it in memory and delete it.

Match lines with alphabetic characters, append G the contents of the hold space. Finally, sub new line for a comma.

明月松间行 2025-01-26 15:10:59

我将使用 GNU AWK 来完成此任务,如下所示,让 file.txt 内容

TypeA/Price:20$
alexmob
moblexto
unkntom

TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret

TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox

然后

awk '/^Type/{header=$0;next}{print /./?$0 ";" header:$0}' file.txt

输出

alexmob;TypeA/Price:20$
moblexto;TypeA/Price:20$
unkntom;TypeA/Price:20$

moblexto2;TypeB/Price:25$
unkntom0;TypeB/Price:25$
alexmob3;TypeB/Price:25$
poptop9;TypeB/Price:25$
tyloret;TypeB/Price:25$

rtyuoper0;TypeC/Price:30$
kunlohpe6;TypeC/Price:30$
mobryhox;TypeC/Price:30$

说明:如果行以 (^) < 开头code>Type 将 header 值设置为该行 ($0) 并转到下一个 行。对于每一行 print 如果它确实包含至少一个字符 (/./) 行 ($0) 并与 ; 和 header,否则按原样打印行 ($0)。

(在 GNU Awk 5.0.1 中测试)

I would use GNU AWK for this task following way, let file.txt content be

TypeA/Price:20$
alexmob
moblexto
unkntom

TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret

TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox

then

awk '/^Type/{header=$0;next}{print /./?$0 ";" header:$0}' file.txt

output

alexmob;TypeA/Price:20$
moblexto;TypeA/Price:20$
unkntom;TypeA/Price:20$

moblexto2;TypeB/Price:25$
unkntom0;TypeB/Price:25$
alexmob3;TypeB/Price:25$
poptop9;TypeB/Price:25$
tyloret;TypeB/Price:25$

rtyuoper0;TypeC/Price:30$
kunlohpe6;TypeC/Price:30$
mobryhox;TypeC/Price:30$

Explanation: If line starts with (^) Type set header value to that line ($0) and go to next line. For every line print if it does contain at least one character (/./) line ($0) concatenated with ; and header, otherwise print line ($0) as is.

(tested in GNU Awk 5.0.1)

扭转时空 2025-01-26 15:10:59

无论您的数据中哪个字符,每个Unix框中的任何壳中使用任何尴尬:

$ awk -v RS= -F'\n' -v OFS=',' '{for (i=2;i<=NF;i++) print $i, $1; print ""}' file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

Using any awk in any shell on every Unix box regardless of which characters are in your data:

$ awk -v RS= -F'\n' -v OFS=',' '{for (i=2;i<=NF;i++) print $i, $1; print ""}' file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文