如何在未形式的数据集中添加带有相邻内容的标题文本，并与定界符分离的sed/awk/python并排分离值

发布于 2025-01-19 15:10:59 字数 865 浏览 1 评论 0原文

我有一长串未格式化的数据，例如 data.txt，其中每个集合都以标题开头并以空行结尾，例如：

TypeA/Price:20$
alexmob
moblexto
unkntom

TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret

TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox

现在，我想添加每个集合的标题及其内容并排并用逗号分隔。就像：

alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

这样每当我使用一个关键字进行 grep 时，相关内容就会与标题一起出现。喜欢：

$grep mob data.txt
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
moblexto2,TypeB/Price:25$
alexmob3,TypeB/Price:25$
mobryhox,TypeC/Price:30$

我是 bash 脚本和 python 的新手，最近开始学习这些，所以非常感谢任何简单的 bash 脚本（使用 sed/awk）或 python 脚本。

原文

I have a long list of unformatted data say data.txt where each set is started with a header and ends with a blank line, like:

TypeA/Price:20$
alexmob
moblexto
unkntom

TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret

TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox

Now, i want to add the header of each set with it's content side by side with comma separated. Like:

alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

so that whenever i will grep with one keyword, relevant content along with the header comes together. Like:

$grep mob data.txt
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
moblexto2,TypeB/Price:25$
alexmob3,TypeB/Price:25$
mobryhox,TypeC/Price:30$

I am newbie on bash scripting as well as python and recently started learning these, so would really appreciate any simple bash scipting (using sed/awk) or python scripting.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

知足的幸福 2025-01-26 15:10:59

使用sed

$ sed '/Type/{h;d;};/[a-z]/{G;s/\n/,/}' input_file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

匹配包含type的行，将其保持在内存中并删除。

与字母字符匹配行，附加g保留空间的内容。最后，逗号的新线。

Using sed

$ sed '/Type/{h;d;};/[a-z]/{G;s/\n/,/}' input_file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

Match lines containing Type, hold it in memory and delete it.

Match lines with alphabetic characters, append G the contents of the hold space. Finally, sub new line for a comma.

回复收藏 0 原文

明月松间行 2025-01-26 15:10:59

我将使用 GNU AWK 来完成此任务，如下所示，让 file.txt 内容

TypeA/Price:20$
alexmob
moblexto
unkntom

TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret

TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox

然后

awk '/^Type/{header=$0;next}{print /./?$0 ";" header:$0}' file.txt

输出

alexmob;TypeA/Price:20$
moblexto;TypeA/Price:20$
unkntom;TypeA/Price:20$

moblexto2;TypeB/Price:25$
unkntom0;TypeB/Price:25$
alexmob3;TypeB/Price:25$
poptop9;TypeB/Price:25$
tyloret;TypeB/Price:25$

rtyuoper0;TypeC/Price:30$
kunlohpe6;TypeC/Price:30$
mobryhox;TypeC/Price:30$

说明：如果行以 (^) < 开头code>Type 将 header 值设置为该行 ($0) 并转到下一个 行。对于每一行 print 如果它确实包含至少一个字符 (/./) 行 ($0) 并与 ; 和 header，否则按原样打印行 ($0)。

（在 GNU Awk 5.0.1 中测试）

I would use GNU AWK for this task following way, let file.txt content be

TypeA/Price:20$
alexmob
moblexto
unkntom

TypeB/Price:25$
moblexto2
unkntom0
alexmob3
poptop9
tyloret

TypeC/Price:30$
rtyuoper0
kunlohpe6
mobryhox

then

awk '/^Type/{header=$0;next}{print /./?$0 ";" header:$0}' file.txt

output

alexmob;TypeA/Price:20$
moblexto;TypeA/Price:20$
unkntom;TypeA/Price:20$

moblexto2;TypeB/Price:25$
unkntom0;TypeB/Price:25$
alexmob3;TypeB/Price:25$
poptop9;TypeB/Price:25$
tyloret;TypeB/Price:25$

rtyuoper0;TypeC/Price:30$
kunlohpe6;TypeC/Price:30$
mobryhox;TypeC/Price:30$

Explanation: If line starts with (^) Type set header value to that line ($0) and go to next line. For every line print if it does contain at least one character (/./) line ($0) concatenated with ; and header, otherwise print line ($0) as is.

(tested in GNU Awk 5.0.1)

回复收藏 0 原文

扭转时空 2025-01-26 15:10:59

无论您的数据中哪个字符，每个Unix框中的任何壳中使用任何尴尬：

$ awk -v RS= -F'\n' -v OFS=',' '{for (i=2;i<=NF;i++) print $i, $1; print ""}' file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

Using any awk in any shell on every Unix box regardless of which characters are in your data:

$ awk -v RS= -F'\n' -v OFS=',' '{for (i=2;i<=NF;i++) print $i, $1; print ""}' file
alexmob,TypeA/Price:20$
moblexto,TypeA/Price:20$
unkntom,TypeA/Price:20$

moblexto2,TypeB/Price:25$
unkntom0,TypeB/Price:25$
alexmob3,TypeB/Price:25$
poptop9,TypeB/Price:25$
tyloret,TypeB/Price:25$

rtyuoper0,TypeC/Price:30$
kunlohpe6,TypeC/Price:30$
mobryhox,TypeC/Price:30$

回复收藏 0 原文

~没有更多了~