将多行合并为一行
我有一个带有输入的 xml 文件的用例,就像
Input:
<abc a="1">
<val>0.25</val>
</abc>
<abc a="2">
<val>0.25</val>
</abc>
<abc a="3">
<val>0.35</val>
</abc>
...
Output:
<abc a="1"><val>0.25</val></abc>
<abc a="2"><val>0.25</val></abc>
<abc a="3"><val>0.35</val></abc>
我在输入格式的文件中有大约 200K 行,如何快速将其转换为输出格式。
I have this use case of an xml file with input like
Input:
<abc a="1">
<val>0.25</val>
</abc>
<abc a="2">
<val>0.25</val>
</abc>
<abc a="3">
<val>0.35</val>
</abc>
...
Output:
<abc a="1"><val>0.25</val></abc>
<abc a="2"><val>0.25</val></abc>
<abc a="3"><val>0.35</val></abc>
I have around 200K lines in a file in the Input format, how can I quickly convert this into output format.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(11)
在 vim 中,您可以使用以下命令执行此操作
:通常 :join 会在加入之前在每行末尾添加一个空格,但
!
会抑制这种情况。一般来说,我建议使用 Python、Ruby 或 Perl 等语言中的适当 XML 解析库来操作 XML 文件(我推荐 Python+ElementTree),但在这种情况下,使用正则表达式解决方案就足够简单了。
In vim you could do this with
Normally :join will add a space at the end of each line before joining, but the
!
suppresses that.In general I would recommend using a proper XML parsing library in a language like Python, Ruby or Perl for manipulating XML files (I recommend Python+ElementTree), but in this case it is simple enough to get away with using a regex solution.
在 Vim 中:
qq
:开始录制宏gJgJ
:连接下两行,不添加空格j
:向下q
:停止记录N@q
:N = 行数(实际上大约是所有行的 1/3,因为它们在运行过程中被压缩)In Vim:
qq
: start recording macrogJgJ
: joins next two lines without adding spacesj
: go downq
: stop recordingN@q
: N = number of lines (actually around 1/3rd of all lines as they get condensed on the go)重击:
Bash:
您可以录制宏。基本上我要做的就是从第一行的开头开始我的光标。按“qa”(将宏记录到a寄存器)。按shift-V 进入逐行视觉模式。然后搜索结束标记“//abc”。然后按 Shift-J 连接线。然后,您必须将光标移动到下一个标签,可能使用“j^”,然后按“q”停止录制。然后,您可以使用“@a”重新运行录制,或者根据需要指定 10000@a。如果标签不同或彼此不紧邻,您只需更改查找开始和结束标签以进行搜索或类似操作的方式。
You can record a macro. Basically what I would do is begin with my cursor at the start of the first line. Press 'qa' (records macro to the a register). The press shift-V to being line-wise visual mode. Then search for the ending tag '//abc'. Then press shift-J to join the lines. Then you would have to move the cursor to the next tag, probably with 'j^' and press 'q' to stop recording. You can then rerun the recording with '@a' or specify 10000@a if you like. If the tags are different or not right after each other you just need to change how you find the opening and closing tags to searches or something like that.
不优雅的 Perl 一行代码应该可以解决问题,尽管不是特别快。
inelegant perl one-liner which should do the trick, though not particularly quickly.
你可以这样做:
You can do this:
这应该在 ex 模式下工作:
:%s/\(^\)^M^\(.*\)^M^\(^<\/abc>\) \).*^M/\1\2\3^M/g
我应该有额外的空格(或值之间的制表符),但你可以根据它的内容删除它(\t 或 \ \ \ \ )。
您正在搜索/替换的是 (pattern1)[enter](pattern2)[enter](pattern3)[enter] 并将其替换为 (pattern1)(pattern2)(pattern3)[enter]
^M 是用 ctrl 完成的+v CTRL+m
This should work in ex mode:
:%s/\(^<abc.*>\)^M^\(.*\)^M^\(^<\/abc>\).*^M/\1\2\3^M/g
I should have extra spaces (or a tab in between the value), but you coud remove it depending on what it is (\t or \ \ \ \ ).
What you are searching/replacing is here is (pattern1)[enter](pattern2)[enter](pattern3)[enter] and replacing it with (pattern1)(pattern2)(pattern3)[enter]
The ^M is done with ctrl+v CTRL+m