perl 正则表达式通过关键字查找 Java StackTrace

发布于 2024-12-10 17:50:45 字数 2598 浏览 1 评论 0原文

我需要通过关键字从日志文件中 grep 完整的堆栈跟踪。

此代码工作正常，但在大文件上速度较慢（文件越多，速度越慢）。我认为改进正则表达式来查找关键字的最佳方法，但我无法完成。

#!/usr/bin/perl

use strict;
use warnings;

my $regexp;
my $stacktrace;
undef $/;

$regexp = shift;
$regexp = quotemeta($regexp);

while (<>) {
  while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s
                 (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
                 (?<THREAD>.*?)\/
                 (?<CLASS>.*?)\s-\s
                 (?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
    $stacktrace = $&;
    if ( $+{MESSAGE} =~ /$regexp/ ) {
      print "$stacktrace";
    }
  }
}

用法： ./grep_log4j.pl;

示例：./grep_log4j.pl Exception Sample.log

我认为 $stacktrace = $&; 中存在问题，因为如果删除此字符串并且只需打印所有匹配的行脚本即可快速运行。用于打印所有匹配项的脚本版本：

#!/usr/bin/perl

use strict;
use warnings;

undef $/;

while (<>) {
  while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s
                 (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
                 (?<THREAD>.*?)\/
                 (?<CLASS>.*?)\s-\s
                 (?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
    print_result();
  }
}

sub print_result {
    print "LEVEL: $+{LEVEL}\n";
    print "TIMESTAMP: $+{TIMESTAMP}\n";
    print "THREAD: $+{THREAD}\n";
    print "CLASS: $+{CLASS}\n";
    print "MESSAGE: $+{MESSAGE}\n";
}

用法：./grep_log4j.pl

示例：./grep_log4j.pl example.log

Lo4j 模式：% -1p %d %t/%c{1} - %m%n

日志文件示例：

I 111012 141506.000 thread/class - Received message: something
E 111012 141606.000 thread/class - Failed handling mobile request
java.lang.NullPointerException
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
  at java.lang.Thread.run(Thread.java:619)
W 111012 141706.000 thread/class - Received message: something
E 111012 141806.000 thread/class - Failed with Exception
java.lang.NullPointerException
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
  at java.lang.Thread.run(Thread.java:619)
D 111012 141906.000 thread/class - Received message: something
S 111012 142006.000 thread/class - Received message: something
I 111012 142106.000 thread/class - Received message: something
I 111013 142206.000 thread/class - Metrics:0/1

我的正则表达式可以在 http://gskinner.com/RegExr/ 通过 log4j 关键字：

原文

I need to grep full stacktrace from logfile by keyword.

This code works fine, but to slow on big files (more than file the slower).
I think the best way to improve regex to find keyword, but I could not get it done.

#!/usr/bin/perl

use strict;
use warnings;

my $regexp;
my $stacktrace;
undef $/;

$regexp = shift;
$regexp = quotemeta($regexp);

while (<>) {
  while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s
                 (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
                 (?<THREAD>.*?)\/
                 (?<CLASS>.*?)\s-\s
                 (?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
    $stacktrace = amp;;
    if ( $+{MESSAGE} =~ /$regexp/ ) {
      print "$stacktrace";
    }
  }
}

Usage: ./grep_log4j.pl <pattern> <file>

Example: ./grep_log4j.pl Exception sample.log

I think problem in $stacktrace = $&; because if remove this string and simply print the all matching lines script works fast.
Version of script to print all matches:

#!/usr/bin/perl

use strict;
use warnings;

undef $/;

while (<>) {
  while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s
                 (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
                 (?<THREAD>.*?)\/
                 (?<CLASS>.*?)\s-\s
                 (?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
    print_result();
  }
}

sub print_result {
    print "LEVEL: $+{LEVEL}\n";
    print "TIMESTAMP: $+{TIMESTAMP}\n";
    print "THREAD: $+{THREAD}\n";
    print "CLASS: $+{CLASS}\n";
    print "MESSAGE: $+{MESSAGE}\n";
}

Usage: ./grep_log4j.pl <file>

Example: ./grep_log4j.pl sample.log

Lo4j pattern: %-1p %d %t/%c{1} - %m%n

Example of logfile:

I 111012 141506.000 thread/class - Received message: something
E 111012 141606.000 thread/class - Failed handling mobile request
java.lang.NullPointerException
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
  at java.lang.Thread.run(Thread.java:619)
W 111012 141706.000 thread/class - Received message: something
E 111012 141806.000 thread/class - Failed with Exception
java.lang.NullPointerException
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
  at java.lang.Thread.run(Thread.java:619)
D 111012 141906.000 thread/class - Received message: something
S 111012 142006.000 thread/class - Received message: something
I 111012 142106.000 thread/class - Received message: something
I 111013 142206.000 thread/class - Metrics:0/1

My regex you can find on http://gskinner.com/RegExr/ by log4j keyword:

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

音盲 2024-12-17 17:50:45

您正在使用：

$/ = undef;

这使得 perl 将整个文件读入内存。

我将像这样逐行处理该文件（假设堆栈跟踪与跟踪上方的消息相关联）：

my $matched;
while (<>) {
  if (m/^(?<LEVEL>\S+) \s+ (?<TIMESTAMP>(\d+) \s+ ([\d.])+) \s+ (?<THREADCLASS>\S+) \s+ - \s+ (?<REST>.*)/x) {
    my %captures = %+;
    $matched = ($+{REST} =~ $regexp);
    if ($matched) {
      print "LEVEL: $captures{LEVEL}\n";
      ...
    }
  } elsif ($matched) {
    print;
  }
}

这是解析多行块的通用技术。
以下循环一次读取一行 STDIN，并将日志文件的完整块提供给子例程 process：

my $first;
my $stack = "";
while (<STDIN>) {
  if (m/^\S /) {
    process($first, $stack) if $first;
    $first = $_;
    $stack = "";
  } else {
    $stack .= $_;
  }
}
process($first, $stack) if $first;

sub process {
  my ($first, $stack) = @_;
  # ... do whatever you want here ...
}

You are using:

$/ = undef;

This makes perl read the entire file into memory.

I would process this file line-by-line like this (assuming the stack trace is associated with the message above the trace):

my $matched;
while (<>) {
  if (m/^(?<LEVEL>\S+) \s+ (?<TIMESTAMP>(\d+) \s+ ([\d.])+) \s+ (?<THREADCLASS>\S+) \s+ - \s+ (?<REST>.*)/x) {
    my %captures = %+;
    $matched = ($+{REST} =~ $regexp);
    if ($matched) {
      print "LEVEL: $captures{LEVEL}\n";
      ...
    }
  } elsif ($matched) {
    print;
  }
}

Here is a general technique for parsing multi-line blocks.
The following loop reads STDIN one line at a time and feeds complete blocks of the log file to the subroutine process:

my $first;
my $stack = "";
while (<STDIN>) {
  if (m/^\S /) {
    process($first, $stack) if $first;
    $first = $_;
    $stack = "";
  } else {
    $stack .= $_;
  }
}
process($first, $stack) if $first;

sub process {
  my ($first, $stack) = @_;
  # ... do whatever you want here ...
}

回复收藏 0 原文

書生途 2024-12-17 17:50:45

问题在于在正则表达式中滥用 [] 。

[...] 用于定义字符类

(...)用于分组

您需要的是将[E|W|D|I]更改为[ EWDI] 随处可见且不使用[] 用于在 MESSAGE 中进行分组。

这是对我有用的最终代码：

#!/usr/bin/perl

use strict;
use warnings;

undef $/;

while (<>) {
    while (
        $_ =~ /(?<LEVEL>^[EWDIS])\s
                 (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
                 (?<THREAD>.*?)\/
                 (?<CLASS>.*?)\s-\s
                 (?<MESSAGE>.*?[\r\n](?=[EWDIS]\s\d{6}\s\d{6}\.\d{3}|$))/gmxs
      )
    {
        print_result();
    }
}

sub print_result {
    print "LEVEL: $+{LEVEL}\n";
    print "TIMESTAMP: $+{TIMESTAMP}\n";
    print "THREAD: $+{THREAD}\n";
    print "CLASS: $+{CLASS}\n";
    print "MESSAGE: $+{MESSAGE}\n";
}

请注意，在标志列表中您错过了“S”字母。

该示例也可能包含错误，但总体上可以工作。

The problem is in misusing [] in your regexp.

[...] is for defining character classes

(...) is for grouping

All you need is to change [E|W|D|I] to [EWDI] everywhere and not use [] for grouping in MESSAGE.

Here's final code that works for me:

#!/usr/bin/perl

use strict;
use warnings;

undef $/;

while (<>) {
    while (
        $_ =~ /(?<LEVEL>^[EWDIS])\s
                 (?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
                 (?<THREAD>.*?)\/
                 (?<CLASS>.*?)\s-\s
                 (?<MESSAGE>.*?[\r\n](?=[EWDIS]\s\d{6}\s\d{6}\.\d{3}|$))/gmxs
      )
    {
        print_result();
    }
}

sub print_result {
    print "LEVEL: $+{LEVEL}\n";
    print "TIMESTAMP: $+{TIMESTAMP}\n";
    print "THREAD: $+{THREAD}\n";
    print "CLASS: $+{CLASS}\n";
    print "MESSAGE: $+{MESSAGE}\n";
}

Note, that in flag list you missed 'S' letter.

This example also may contains errors, but it works in general.

回复收藏 0 原文

~没有更多了~