用 Sed 替换混合转义序列、控制字符和文字?
我需要清理数据转储中的文字、可见和转义控制字符的奇怪组合(最好使用 sed),例如 ^A、 ^B、\N(字面意思)和可见的换行符。我需要清理文件,使可见的换行符保持完整,用制表符替换每个 ^A,并删除每个 ^B\N^B\N (它遵循数据,例如 13068505731812510)。
这就是在 shell 命令中使用 less
的内容(在 shell 中,^A 和 ^B 字符具有深色背景表示控制字符):
^A guid ^A unix-time ^B\N^B\N^A 4 ^A^A 7.0 ^A IE ^A 8 ^A guid ^A WinNT ^A ...(可见换行符)
或者一个文字示例...
... ^A40C4595C-0B9D-46B7-8214-3D9CE2B5F057^A13071154505579551^B\N^B\N^A4^A192.168.21.136^A7.0^AIE^A8^AE6979203-F58B-4D20-9D66-7F5369BF9E32^AWinXP^A ...
到目前为止,我一直在输入 sed 的转义序列尚未产生预期的输出。有谁知道如何在尽可能少的时间内实现这一切所需的魔法逃脱? (文件很多,时间很紧迫。)谢谢!如果我可以在同一遍中将 UNIX 时间数字转换为人类可读的时间,那就加分了。
I have a weird mix of literal, visible and escaped control characters in a data dump that I need to clean (preferably with sed), for example ^A, ^B, \N (literally), and visible newlines. I need to clean the file such that the visible newlines remain intact, replace every ^A with a tab character, and strip every ^B\N^B\N (which follows every unix time value in the data e.g. 13068505731812510).
This is what the contents look like using less
in a shell command (in the shell, the ^A and ^B characters have a dark background to denote control chars):
^A guid ^A unix-time ^B\N^B\N^A 4 ^A 192.168.21.136 ^A 7.0 ^A IE ^A 8 ^A guid ^A WinNT ^A ... (visible newline)
Or a literal example...
... ^A40C4595C-0B9D-46B7-8214-3D9CE2B5F057^A13071154505579551^B\N^B\N^A4^A192.168.21.136^A7.0^AIE^A8^AE6979203-F58B-4D20-9D66-7F5369BF9E32^AWinXP^A ...
So far the escape sequences I've been feeding sed have not been producing the expected output. Does anyone know the magic escapes needed to make all this happen in as few passes as possible? (There are many gigs of files, and time counts.) Thanks! Bonus points if I can convert the unix time digits into human-readable times in the same pass.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
将 ^A 更改为制表符:
删除 ^B^N:
Change ^A to tabs:
Strip our ^B^N: