perl 正则表达式通过关键字查找 Java StackTrace
我需要通过关键字从日志文件中 grep 完整的堆栈跟踪。
此代码工作正常,但在大文件上速度较慢(文件越多,速度越慢)。 我认为改进正则表达式来查找关键字的最佳方法,但我无法完成。
#!/usr/bin/perl
use strict;
use warnings;
my $regexp;
my $stacktrace;
undef $/;
$regexp = shift;
$regexp = quotemeta($regexp);
while (<>) {
while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s
(?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
(?<THREAD>.*?)\/
(?<CLASS>.*?)\s-\s
(?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
$stacktrace = $&;
if ( $+{MESSAGE} =~ /$regexp/ ) {
print "$stacktrace";
}
}
}
用法: ./grep_log4j.pl
示例:./grep_log4j.pl Exception Sample.log
我认为 $stacktrace = $&;
中存在问题,因为如果删除此字符串并且只需打印所有匹配的行脚本即可快速运行。 用于打印所有匹配项的脚本版本:
#!/usr/bin/perl
use strict;
use warnings;
undef $/;
while (<>) {
while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s
(?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
(?<THREAD>.*?)\/
(?<CLASS>.*?)\s-\s
(?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
print_result();
}
}
sub print_result {
print "LEVEL: $+{LEVEL}\n";
print "TIMESTAMP: $+{TIMESTAMP}\n";
print "THREAD: $+{THREAD}\n";
print "CLASS: $+{CLASS}\n";
print "MESSAGE: $+{MESSAGE}\n";
}
用法:./grep_log4j.pl
示例:./grep_log4j.pl example.log
Lo4j 模式:% -1p %d %t/%c{1} - %m%n
日志文件示例:
I 111012 141506.000 thread/class - Received message: something
E 111012 141606.000 thread/class - Failed handling mobile request
java.lang.NullPointerException
at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
at java.lang.Thread.run(Thread.java:619)
W 111012 141706.000 thread/class - Received message: something
E 111012 141806.000 thread/class - Failed with Exception
java.lang.NullPointerException
at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
at java.lang.Thread.run(Thread.java:619)
D 111012 141906.000 thread/class - Received message: something
S 111012 142006.000 thread/class - Received message: something
I 111012 142106.000 thread/class - Received message: something
I 111013 142206.000 thread/class - Metrics:0/1
我的正则表达式可以在 http://gskinner.com/RegExr/ 通过 log4j 关键字:
I need to grep full stacktrace from logfile by keyword.
This code works fine, but to slow on big files (more than file the slower).
I think the best way to improve regex to find keyword, but I could not get it done.
#!/usr/bin/perl
use strict;
use warnings;
my $regexp;
my $stacktrace;
undef $/;
$regexp = shift;
$regexp = quotemeta($regexp);
while (<>) {
while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s
(?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
(?<THREAD>.*?)\/
(?<CLASS>.*?)\s-\s
(?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
$stacktrace = amp;;
if ( $+{MESSAGE} =~ /$regexp/ ) {
print "$stacktrace";
}
}
}
Usage: ./grep_log4j.pl <pattern> <file>
Example: ./grep_log4j.pl Exception sample.log
I think problem in $stacktrace = $&;
because if remove this string and simply print the all matching lines script works fast.
Version of script to print all matches:
#!/usr/bin/perl
use strict;
use warnings;
undef $/;
while (<>) {
while ( $_ =~ /(?<LEVEL>^[E|W|D|I])\s
(?<TIMESTAMP>\d{6}\s\d{6}\.\d{3})\s
(?<THREAD>.*?)\/
(?<CLASS>.*?)\s-\s
(?<MESSAGE>.*?[\r|\n](?=^[[E|W|D|I]\s\d{6}\s\d{6}\.\d{3}]?))/gsmx ) {
print_result();
}
}
sub print_result {
print "LEVEL: $+{LEVEL}\n";
print "TIMESTAMP: $+{TIMESTAMP}\n";
print "THREAD: $+{THREAD}\n";
print "CLASS: $+{CLASS}\n";
print "MESSAGE: $+{MESSAGE}\n";
}
Usage: ./grep_log4j.pl <file>
Example: ./grep_log4j.pl sample.log
Lo4j pattern: %-1p %d %t/%c{1} - %m%n
Example of logfile:
I 111012 141506.000 thread/class - Received message: something
E 111012 141606.000 thread/class - Failed handling mobile request
java.lang.NullPointerException
at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
at java.lang.Thread.run(Thread.java:619)
W 111012 141706.000 thread/class - Received message: something
E 111012 141806.000 thread/class - Failed with Exception
java.lang.NullPointerException
at javax.servlet.http.HttpServlet.service(HttpServlet.java:710)
at java.lang.Thread.run(Thread.java:619)
D 111012 141906.000 thread/class - Received message: something
S 111012 142006.000 thread/class - Received message: something
I 111012 142106.000 thread/class - Received message: something
I 111013 142206.000 thread/class - Metrics:0/1
My regex you can find on http://gskinner.com/RegExr/ by log4j keyword:
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您正在使用:
这使得 perl 将整个文件读入内存。
我将像这样逐行处理该文件(假设堆栈跟踪与跟踪上方的消息相关联):
这是解析多行块的通用技术。
以下循环一次读取一行
STDIN
,并将日志文件的完整块提供给子例程process
:You are using:
This makes perl read the entire file into memory.
I would process this file line-by-line like this (assuming the stack trace is associated with the message above the trace):
Here is a general technique for parsing multi-line blocks.
The following loop reads
STDIN
one line at a time and feeds complete blocks of the log file to the subroutineprocess
:问题在于在正则表达式中滥用
[]
。[...]
用于定义 字符类(...)
用于分组您需要的是将
[E|W|D|I]
更改为[ EWDI]
随处可见且不使用[]
用于在MESSAGE
中进行分组。这是对我有用的最终代码:
请注意,在标志列表中您错过了“S”字母。
该示例也可能包含错误,但总体上可以工作。
The problem is in misusing
[]
in your regexp.[...]
is for defining character classes(...)
is for groupingAll you need is to change
[E|W|D|I]
to[EWDI]
everywhere and not use[]
for grouping inMESSAGE
.Here's final code that works for me:
Note, that in flag list you missed 'S' letter.
This example also may contains errors, but it works in general.