将大型文本文件重新格式化为一行字符串(通过 BASH)

发布于 2024-08-09 00:02:47 字数 651 浏览 5 评论 0原文

File1:

hello
- dictionary definitions:
hi
hello
hallo
greetings
salutations
no more hello for you
-
world
- dictionary definitions:
universe
everything
the globe
the biggest tree
planet
cess pool of organic life
-

我需要将其(对于大量单词)格式化为术语定义格式(每个术语一行)。如何才能实现这一目标呢?没有一个单词是相同的,只有上面看到的结构是相同的。结果文件看起来像这样:

hello    - dictionary definitions:    hi    hello    hallo    greetings    salutations    no more hello for you    -
world    - dictionary definitions:    universe    everything    the globe    the biggest tree    planet    cess pool of organic life    -

Awk/Sed/Grep/Cat 是常见的竞争者。

File1:

hello
- dictionary definitions:
hi
hello
hallo
greetings
salutations
no more hello for you
-
world
- dictionary definitions:
universe
everything
the globe
the biggest tree
planet
cess pool of organic life
-

I need to format this (for a huge list of words) into a term to definition format (one line per term). How can one achieve this? None of the words are the same, only the structure seen above is. The resultant file would look something like this:

hello    - dictionary definitions:    hi    hello    hallo    greetings    salutations    no more hello for you    -
world    - dictionary definitions:    universe    everything    the globe    the biggest tree    planet    cess pool of organic life    -

Awk/Sed/Grep/Cat are the usual contenders.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

围归者 2024-08-16 00:02:47

谁说只有 Perl 才能优雅地做到这一点? :)

$ gawk -vRS="-\n" '{gsub(/\n/," ")}1' file
hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life

或者

# gawk 'BEGIN{RS="-\n";FS="\n";OFS=" "}{$1=$1}1'  file
hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life

and who says only Perl can do it elegantly ? :)

$ gawk -vRS="-\n" '{gsub(/\n/," ")}1' file
hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life

OR

# gawk 'BEGIN{RS="-\n";FS="\n";OFS=" "}{$1=$1}1'  file
hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life
孤独患者 2024-08-16 00:02:47
awk 'BEGIN {FS="\n"; RS="-\n"}{for(i=1;i<=NF;i++) printf("%s   ",$i); if($1)print"-";}' dict.txt

输出:

hello   - dictionary definitions:   hi   hello   hallo   greetings   salutations   no more hello for you   -
world   - dictionary definitions:   universe   everything   the globe   the biggest tree   planet   cess pool of organic life   -
awk 'BEGIN {FS="\n"; RS="-\n"}{for(i=1;i<=NF;i++) printf("%s   ",$i); if($1)print"-";}' dict.txt

outputs:

hello   - dictionary definitions:   hi   hello   hallo   greetings   salutations   no more hello for you   -
world   - dictionary definitions:   universe   everything   the globe   the biggest tree   planet   cess pool of organic life   -
独行侠 2024-08-16 00:02:47

Perl 一行:

perl -pe 'chomp;s/^-$/\n/;print " "' File1

给出

 hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
 world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life 

This is 'some like' your required output.

A perl one-liner:

perl -pe 'chomp;s/^-$/\n/;print " "' File1

gives

 hello - dictionary definitions: hi hello hallo greetings salutations no more hello for you
 world - dictionary definitions: universe everything the globe the biggest tree planet cess pool of organic life 

This is 'something like' your required output.

她如夕阳 2024-08-16 00:02:47

不确定您将使用的脚本语言,这里是伪代码:

for each line
 if line is "-"
  create new line
 else
  append separator to previous line
  append line to previous line
 end if
end for loop

Not sure the scripting language you will be using, pseudo code here:

for each line
 if line is "-"
  create new line
 else
  append separator to previous line
  append line to previous line
 end if
end for loop
最美的太阳 2024-08-16 00:02:47

尝试一下这个班轮的工作条件是,一个单词总是 6 行

sed 'N;N;N;N;N;N;N;N;s/\n/ /g' test_3

Try this one liner works on a conditions that theer will always be 6 lines for a word

sed 'N;N;N;N;N;N;N;N;s/\n/ /g' test_3
所谓喜欢 2024-08-16 00:02:47
sed -ne'1{x;d};/^-$/{g;s/\n/ /g;p;n;x;d};H'
awk -v'RS=\n-\n' '{gsub(/\n/," ")}1'
sed -ne'1{x;d};/^-$/{g;s/\n/ /g;p;n;x;d};H'
awk -v'RS=\n-\n' '{gsub(/\n/," ")}1'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文