在 sed 或 awk 中，如何处理可能跨多行的记录分隔符？

发布于 2024-07-10 06:09:54 字数 604 浏览 13 评论 0原文

我的日志文件是：

 Wed Nov 12 blah blah blah blah cat1
 Wed Nov 12 blah blah blah blah
 Wed Nov 12 blah blah blah blah 
 Wed Nov 12 blah blah blah blah cat2
     more blah blah
     even more blah blah
 Wed Nov 12 blah blah blah blah cat3
 Wed Nov 12 blah blah blah blah cat4

我想解析出第一行找到 cat 的完整多行条目。在 sed 和/或 awk 中执行此操作的最佳方法是什么？

即我希望我的解析产生：

 Wed Nov 12 blah blah blah blah cat1
 Wed Nov 12 blah blah blah blah cat2
     more blah blah
     even more blah blah
 Wed Nov 12 blah blah blah blah cat3
 Wed Nov 12 blah blah blah blah cat4

原文

My log file is:

 Wed Nov 12 blah blah blah blah cat1
 Wed Nov 12 blah blah blah blah
 Wed Nov 12 blah blah blah blah 
 Wed Nov 12 blah blah blah blah cat2
     more blah blah
     even more blah blah
 Wed Nov 12 blah blah blah blah cat3
 Wed Nov 12 blah blah blah blah cat4

I want to parse out the full multiline entries where cat is found on the first line. What's the best way to do this in sed and/or awk?

i.e. i want my parse to produce:

 Wed Nov 12 blah blah blah blah cat1
 Wed Nov 12 blah blah blah blah cat2
     more blah blah
     even more blah blah
 Wed Nov 12 blah blah blah blah cat3
 Wed Nov 12 blah blah blah blah cat4

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

草莓味的萝莉 2024-07-17 06:09:54

如果你说以空格开头的每一行都是 (g)awk 的后续内容的延续（这是我的记忆，所以可能它包含一些小拼写错误，并且为了更好的可读性和一些额外的换行符）

awk " BEGIN { multiline = 0;} 
      ! /^ / { if (whatever) 
                 { print; multiline = 1;} 
               else 
                 multiline = 0; 
             } 
        /^ / {if (multiline == 1) 
                 print;
             } 
     " 
      yourfile

： >whatever 是你检查你的输出是否应该发生（例如对于猫）。

if you say every line that starts with space is a continuation of the folling its easy with (g)awk (this is from my memory, so maybe it contains some minor typos, and for better readability with some additional linebreaks):

awk " BEGIN { multiline = 0;} 
      ! /^ / { if (whatever) 
                 { print; multiline = 1;} 
               else 
                 multiline = 0; 
             } 
        /^ / {if (multiline == 1) 
                 print;
             } 
     " 
      yourfile

where whatever is your check if your output should happen (e.g. for the cat).

回复收藏 0 原文

冬天旳寂寞 2024-07-17 06:09:54

假设您的日志文件不包含控制字符 '\01' 和 '\02'，并且连续行恰好以四个空格开头，则以下内容可能有效：

c1=`echo -en '\01'`
c2=`echo -en '\02'`
cat logfile | tr '\n' $c1 | sed "s/$c1    /$c2/g" | sed "s/$c1/\n/g" | grep cat | sed "s/$c2/\n    /g"

说明：这会将每个换行符替换为 ASCII 1（不应出现在日志文件中的控制字符），并将每个序列“换行符-空格-空格-空格-空格”替换为 ASCII 2（另一个控制字符）。然后它用换行符重新替换 ASCII 1，因此现在多行的每个序列都被放入一行，旧的换行符被 ASCII 2 替换。这将被 grep 为 cat，然后 ASCII 2 被重新替换为换行符-空格-空格-空格-空格组合。

Assuming your log file does not contain the control characters '\01' and '\02', and that a continued line begins with precisely four spaces, the following might work:

c1=`echo -en '\01'`
c2=`echo -en '\02'`
cat logfile | tr '\n' $c1 | sed "s/$c1    /$c2/g" | sed "s/$c1/\n/g" | grep cat | sed "s/$c2/\n    /g"

Explanation: this replaces each newline with ASCII 1 (a control character that should never appear in a log file) and each sequence "newline-space-space-space-space" with ASCII 2 (another control character). It then re-replaces ASCII 1 with newlines, so now each sequence of multiple lines is put into one line, with the old line breaks replaced by ASCII 2. This is grepped for cat, and then the ASCII 2's are re-replaced with the newline-space-space-space-space combination.

回复收藏 0 原文

听你说爱我 2024-07-17 06:09:54

像这样的东西吗？

awk 'function print_part() { if(cat) print part }  /^  / { part = part "\n" $0; next } /cat[0-9]$/ { print_part(); part = $0; cat = 1; next;  } { print_part(); cat=0} END { print_part() }' inputfile

/^ / 正则表达式标识连续行。

/cat[0-9]$/ 正则表达式标识您要保留的起始行。

Something like this?

awk 'function print_part() { if(cat) print part }  /^  / { part = part "\n" $0; next } /cat[0-9]$/ { print_part(); part = $0; cat = 1; next;  } { print_part(); cat=0} END { print_part() }' inputfile

The /^ / regexp identifies continuation lines.

The /cat[0-9]$/ regexp identifies the starter lines you want to keep.

回复收藏 0 原文

苦妄 2024-07-17 06:09:54

另一种方法是将 RS 设置为不同于正常 \n 的值。例如：

$ awk -v Pre=Wed 'BEGIN {RS = "\\n?\\s*" Pre} /cat.\n?/ {print Pre $0}' file.log
Wed Nov 12 blah blah blah blah cat1
Wed Nov 12 blah blah blah blah cat2
     more blah blah
     even more blah blah
Wed Nov 12 blah blah blah blah cat3
Wed Nov 12 blah blah blah blah cat4

Another approach would be to set RS to be something other than the normal \n. For example:

$ awk -v Pre=Wed 'BEGIN {RS = "\\n?\\s*" Pre} /cat.\n?/ {print Pre $0}' file.log
Wed Nov 12 blah blah blah blah cat1
Wed Nov 12 blah blah blah blah cat2
     more blah blah
     even more blah blah
Wed Nov 12 blah blah blah blah cat3
Wed Nov 12 blah blah blah blah cat4

回复收藏 0 原文

~没有更多了~