将多行合并为一行

发布于 2024-08-25 04:57:43 字数 482 浏览 7 评论 0原文

我有一个带有输入的 xml 文件的用例,就像

Input:
<abc a="1">
   <val>0.25</val>
</abc> 
<abc a="2">
    <val>0.25</val>
</abc> 
<abc a="3">
   <val>0.35</val>
</abc> 
 ...

Output:
<abc a="1"><val>0.25</val></abc> 
<abc a="2"><val>0.25</val></abc>
<abc a="3"><val>0.35</val></abc>

我在输入格式的文件中有大约 200K 行,如何快速将其转换为输出格式。

I have this use case of an xml file with input like

Input:
<abc a="1">
   <val>0.25</val>
</abc> 
<abc a="2">
    <val>0.25</val>
</abc> 
<abc a="3">
   <val>0.35</val>
</abc> 
 ...

Output:
<abc a="1"><val>0.25</val></abc> 
<abc a="2"><val>0.25</val></abc>
<abc a="3"><val>0.35</val></abc>

I have around 200K lines in a file in the Input format, how can I quickly convert this into output format.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(11

分分钟 2024-09-01 04:57:43

在 vim 中,您可以使用以下命令执行此操作

:g/<abc/ .,/<\/abc/ join!

:通常 :join 会在加入之前在每行末尾添加一个空格,但 ! 会抑制这种情况。

一般来说,我建议使用 Python、Ruby 或 Perl 等语言中的适当 XML 解析库来操作 XML 文件(我推荐 Python+ElementTree),但在这种情况下,使用正则表达式解决方案就足够简单了。

In vim you could do this with

:g/<abc/ .,/<\/abc/ join!

Normally :join will add a space at the end of each line before joining, but the ! suppresses that.

In general I would recommend using a proper XML parsing library in a language like Python, Ruby or Perl for manipulating XML files (I recommend Python+ElementTree), but in this case it is simple enough to get away with using a regex solution.

糖果控 2024-09-01 04:57:43

在 Vim 中:

  • 第一行的位置
  • qq:开始录制宏
  • gJgJ:连接下两行,不添加空格
  • j:向下
  • q:停止记录
  • N@q:N = 行数(实际上大约是所有行的 1/3,因为它们在运行过程中被压缩)

In Vim:

  • position on first line
  • qq: start recording macro
  • gJgJ: joins next two lines without adding spaces
  • j: go down
  • q: stop recording
  • N@q: N = number of lines (actually around 1/3rd of all lines as they get condensed on the go)
看轻我的陪伴 2024-09-01 04:57:43
$ awk '
    /<abc/ && NR > 1 {print ""}
    {gsub(" +"," "); printf "%s",$0}
' file
<abc a="1"> <val>0.25</val></abc>
<abc a="2"> <val>0.25</val></abc>
<abc a="3"> <val>0.35</val></abc>
$ awk '
    /<abc/ && NR > 1 {print ""}
    {gsub(" +"," "); printf "%s",$0}
' file
<abc a="1"> <val>0.25</val></abc>
<abc a="2"> <val>0.25</val></abc>
<abc a="3"> <val>0.35</val></abc>
看透却不说透 2024-09-01 04:57:43

重击:

while read s; do echo -n $s; read s; echo -n $s; read s; echo $s; done < file.xml

Bash:

while read s; do echo -n $s; read s; echo -n $s; read s; echo $s; done < file.xml
眼泪淡了忧伤 2024-09-01 04:57:43

您可以录制宏。基本上我要做的就是从第一行的开头开始我的光标。按“qa”(将宏记录到a寄存器)。按shift-V 进入逐行视觉模式。然后搜索结束标记“//abc”。然后按 Shift-J 连接线。然后,您必须将光标移动到下一个标签,可能使用“j^”,然后按“q”停止录制。然后,您可以使用“@a”重新运行录制,或者根据需要指定 10000@a。如果标签不同或彼此不紧邻,您只需更改查找开始和结束标签以进行搜索或类似操作的方式。

You can record a macro. Basically what I would do is begin with my cursor at the start of the first line. Press 'qa' (records macro to the a register). The press shift-V to being line-wise visual mode. Then search for the ending tag '//abc'. Then press shift-J to join the lines. Then you would have to move the cursor to the next tag, probably with 'j^' and press 'q' to stop recording. You can then rerun the recording with '@a' or specify 10000@a if you like. If the tags are different or not right after each other you just need to change how you find the opening and closing tags to searches or something like that.

嗳卜坏 2024-09-01 04:57:43
sed '/^<abc/{N;N;s/\n\| //g}'

# remove \n or "space" 
# Result

<abca="1"><val>0.25</val></abc>
<abca="2"><val>0.25</val></abc>
<abca="3"><val>0.35</val></abc>
sed '/^<abc/{N;N;s/\n\| //g}'

# remove \n or "space" 
# Result

<abca="1"><val>0.25</val></abc>
<abca="2"><val>0.25</val></abc>
<abca="3"><val>0.35</val></abc>
小鸟爱天空丶 2024-09-01 04:57:43

不优雅的 Perl 一行代码应该可以解决问题,尽管不是特别快。

cat file | perl -e '
    $x=0;
    while(<>){
        s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/g;
        print;
        $x++;
    if($x==3){
        print"\n";
        $x=0;
    }
}' > output

inelegant perl one-liner which should do the trick, though not particularly quickly.

cat file | perl -e '
    $x=0;
    while(<>){
        s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/g;
        print;
        $x++;
    if($x==3){
        print"\n";
        $x=0;
    }
}' > output
树深时见影 2024-09-01 04:57:43

你可以这样做:

perl -e '$i=1; while(<>){chomp;$s.=$_;if($i%3==0){$s=~s{>\s+<}{><};print "$s\n";$s="";}$i++;}' file

You can do this:

perl -e '$i=1; while(<>){chomp;$s.=$_;if($i%3==0){$s=~s{>\s+<}{><};print "$s\n";$s="";}$i++;}' file
妳是的陽光 2024-09-01 04:57:43
sed '/<abc/,/<\/abc>/{:a;N;s/\n//g;s|<\/abc>|<\/abc>\n|g;H;ta}'  file
sed '/<abc/,/<\/abc>/{:a;N;s/\n//g;s|<\/abc>|<\/abc>\n|g;H;ta}'  file
相思碎 2024-09-01 04:57:43
tr "\n" " "<myfile|sed 's|<\/abc>|<\/abc>\n|g;s/[ \t]*<abc/<abc/g;s/>[ \t]*</></g'
tr "\n" " "<myfile|sed 's|<\/abc>|<\/abc>\n|g;s/[ \t]*<abc/<abc/g;s/>[ \t]*</></g'
橪书 2024-09-01 04:57:43

这应该在 ex 模式下工作:

:%s/\(^\)^M^\(.*\)^M^\(^<\/abc>\) \).*^M/\1\2\3^M/g

我应该有额外的空格(或值之间的制表符),但你可以根据它的内容删除它(\t 或 \ \ \ \ )。

您正在搜索/替换的是 (pattern1)[enter](pattern2)[enter](pattern3)[enter] 并将其替换为 (pattern1)(pattern2)(pattern3)[enter]

^M 是用 ctrl 完成的+v CTRL+m

This should work in ex mode:

:%s/\(^<abc.*>\)^M^\(.*\)^M^\(^<\/abc>\).*^M/\1\2\3^M/g

I should have extra spaces (or a tab in between the value), but you coud remove it depending on what it is (\t or \ \ \ \ ).

What you are searching/replacing is here is (pattern1)[enter](pattern2)[enter](pattern3)[enter] and replacing it with (pattern1)(pattern2)(pattern3)[enter]

The ^M is done with ctrl+v CTRL+m

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文