在 sed 或 awk 中,如何处理*可能*跨多行的记录分隔符?
我的日志文件是:
Wed Nov 12 blah blah blah blah cat1
Wed Nov 12 blah blah blah blah
Wed Nov 12 blah blah blah blah
Wed Nov 12 blah blah blah blah cat2
more blah blah
even more blah blah
Wed Nov 12 blah blah blah blah cat3
Wed Nov 12 blah blah blah blah cat4
我想解析出第一行找到 cat 的完整多行条目。 在 sed 和/或 awk 中执行此操作的最佳方法是什么?
即我希望我的解析产生:
Wed Nov 12 blah blah blah blah cat1
Wed Nov 12 blah blah blah blah cat2
more blah blah
even more blah blah
Wed Nov 12 blah blah blah blah cat3
Wed Nov 12 blah blah blah blah cat4
My log file is:
Wed Nov 12 blah blah blah blah cat1
Wed Nov 12 blah blah blah blah
Wed Nov 12 blah blah blah blah
Wed Nov 12 blah blah blah blah cat2
more blah blah
even more blah blah
Wed Nov 12 blah blah blah blah cat3
Wed Nov 12 blah blah blah blah cat4
I want to parse out the full multiline entries where cat is found on the first line. What's the best way to do this in sed
and/or awk
?
i.e. i want my parse to produce:
Wed Nov 12 blah blah blah blah cat1
Wed Nov 12 blah blah blah blah cat2
more blah blah
even more blah blah
Wed Nov 12 blah blah blah blah cat3
Wed Nov 12 blah blah blah blah cat4
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果你说以空格开头的每一行都是 (g)awk 的后续内容的延续(这是我的记忆,所以可能它包含一些小拼写错误,并且为了更好的可读性和一些额外的换行符)
: >whatever 是你检查你的输出是否应该发生(例如对于猫)。
if you say every line that starts with space is a continuation of the folling its easy with (g)awk (this is from my memory, so maybe it contains some minor typos, and for better readability with some additional linebreaks):
where
whatever
is your check if your output should happen (e.g. for the cat).假设您的日志文件不包含控制字符
'\01'
和'\02'
,并且连续行恰好以四个空格开头,则以下内容可能有效:说明:这会将每个换行符替换为 ASCII 1(不应出现在日志文件中的控制字符),并将每个序列“换行符-空格-空格-空格-空格”替换为 ASCII 2(另一个控制字符)。 然后它用换行符重新替换 ASCII 1,因此现在多行的每个序列都被放入一行,旧的换行符被 ASCII 2 替换。这将被 grep 为 cat,然后 ASCII 2 被重新替换为换行符-空格-空格-空格-空格组合。
Assuming your log file does not contain the control characters
'\01'
and'\02'
, and that a continued line begins with precisely four spaces, the following might work:Explanation: this replaces each newline with ASCII 1 (a control character that should never appear in a log file) and each sequence "newline-space-space-space-space" with ASCII 2 (another control character). It then re-replaces ASCII 1 with newlines, so now each sequence of multiple lines is put into one line, with the old line breaks replaced by ASCII 2. This is grepped for cat, and then the ASCII 2's are re-replaced with the newline-space-space-space-space combination.
像这样的东西吗?
/^ /
正则表达式标识连续行。/cat[0-9]$/
正则表达式标识您要保留的起始行。Something like this?
The
/^ /
regexp identifies continuation lines.The
/cat[0-9]$/
regexp identifies the starter lines you want to keep.另一种方法是将
RS
设置为不同于正常\n
的值。 例如:Another approach would be to set
RS
to be something other than the normal\n
. For example: