使用 grep 解析日志的 unix shell 脚本

发布于 2024-10-02 07:00:49 字数 1774 浏览 6 评论 0 原文

events.log 的内容:

<log>  
 <time>09:00:30</time>  
 <entry1>abcd</entry1>  
 <entry2>abcd</entry2>  
 <id>john</id>  
</log>  
<log>
 <time>09:00:35</time>  
 <entry1>abcd</entry1>  
 <entry2>abcd</entry2>  
 <id>steve</id>  
</log>  
<log>  
 <time>09:00:40</time>  
 <entry1>abcd</entry1>  
 <entry2>abcd</entry2>  
 <id>john</id>  
</log>  

我想提取所有带有条目的entry1和entry2标签'约翰' 到一个文件中。我想在 shell 脚本中执行此操作,该脚本将查找目录中的所有 *.log 文件。输出应类似于以下内容。

a.out 的内容:

<time>09:00:30</time>   
<entry1>abcd</entry1>  
<entry2>abcd</entry2>

<time>09:00:40</time>  
<entry1>abcd</entry1>  
<entry2>abcd</entry2>  

我是 shell 脚本编写的新手,但是我尝试使用一些基本命令至少查看日志:

$ grep -B 3 -in '<id>john</id>' * > /tmp/a.out

上面的命令为我提供了 john 的 id 标记上方 3 行的输出,如下所示

...   
events111.log-100- <time>09:00:40</time>  
events111.log-101- <entry1>abcd</entry1>  
events111.log-102- <entry2>abcd</entry2>  
events111.log-103- <id>john</id>  
....  
events112.log-200- <time>06:56:03</time>  
events112.log-201- <entry1>abcd</entry1>  
events112.log-202- <entry2>abcd</entry2>  
events112.log-203- <id>john</id>  

这很好,但有问题-3 行不会每次都起作用,中间可能有更多标签,因此需要一些解析逻辑来找出从 的文本;

我真的很感激一些关于为此制定脚本的帮助。

谢谢!

Contents of events<xyz>.log:

<log>  
 <time>09:00:30</time>  
 <entry1>abcd</entry1>  
 <entry2>abcd</entry2>  
 <id>john</id>  
</log>  
<log>
 <time>09:00:35</time>  
 <entry1>abcd</entry1>  
 <entry2>abcd</entry2>  
 <id>steve</id>  
</log>  
<log>  
 <time>09:00:40</time>  
 <entry1>abcd</entry1>  
 <entry2>abcd</entry2>  
 <id>john</id>  
</log>  

I want to extract entry1 and entry2 tags of all <log> entries with <id> 'john' into a file. i want to do this in a shell script which would look in all *.log files in a directory. The output should be similar to the following.

Contents of a.out:

<time>09:00:30</time>   
<entry1>abcd</entry1>  
<entry2>abcd</entry2>

<time>09:00:40</time>  
<entry1>abcd</entry1>  
<entry2>abcd</entry2>  

I am new to shell scripting, however I tried with some basic commands to at least look at the logs:

$ grep -B 3 -in '<id>john</id>' * > /tmp/a.out

above command gives me output with 3 lines above id tag for john as follows

...   
events111.log-100- <time>09:00:40</time>  
events111.log-101- <entry1>abcd</entry1>  
events111.log-102- <entry2>abcd</entry2>  
events111.log-103- <id>john</id>  
....  
events112.log-200- <time>06:56:03</time>  
events112.log-201- <entry1>abcd</entry1>  
events112.log-202- <entry2>abcd</entry2>  
events112.log-203- <id>john</id>  

This is fine, but the problem is that -3 lines wont work every time, there could be more tags in between, so there is some parsing logic needed to find out the text from <time> to </id>.

I would really appreciate some help around formulating a script for this.

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

一袭水袖舞倾城 2024-10-09 07:00:49

您是否考虑过使用 xml grep 工具,例如 xml starlet 来从中挑选出片段这些日志文件?会干净很多。

Have you considered using a xml grepping tool like xml starlet to pick out the pieces from these log files? It would be much more cleaner.

太阳男子 2024-10-09 07:00:49

使用 shell 脚本执行此操作并不是真正适合该工作的工具。你确实需要一个解析器。这是 python 中的单个文件。您可以围绕此进行循环并执行整个日志文件目录。

#!/usr/bin/env python
import sys
from BeautifulSoup import BeautifulSoup, Tag   

f = open(sys.argv[1], 'r')   
soup = BeautifulSoup(f.read())    
for log in soup.findAll('log'):
 if log.id.contents[0] == "john":
   print log.entry1
   print log.entry2

Doing this with a shell script is not really the right tool for the job. You really need a parser. Here's one in python for a single file. You could throw a loop around this and do an entire directory of log files.

#!/usr/bin/env python
import sys
from BeautifulSoup import BeautifulSoup, Tag   

f = open(sys.argv[1], 'r')   
soup = BeautifulSoup(f.read())    
for log in soup.findAll('log'):
 if log.id.contents[0] == "john":
   print log.entry1
   print log.entry2
网名女生简单气质 2024-10-09 07:00:49
has() { echo "$line" | grep "$1" >/dev/null; }
while read line; do
 has /log && echo;
 (has time   || has entry1 || has entry2) && echo "$line";
done;

您可能想

<time>09:00:30</time>
<entry1>abcd</entry1>
<entry2>abcd</entry2>

<log> <time>09:00:35</time>
<entry1>abcd</entry1>
<entry2>abcd</entry2>

<time>09:00:40</time>
<entry1>abcd</entry1>
<entry2>abcd</entry2>

也可能不想抑制“time”行中的“”。

has() { echo "$line" | grep "$1" >/dev/null; }
while read line; do
 has /log && echo;
 (has time   || has entry1 || has entry2) && echo "$line";
done;

prints

<time>09:00:30</time>
<entry1>abcd</entry1>
<entry2>abcd</entry2>

<log> <time>09:00:35</time>
<entry1>abcd</entry1>
<entry2>abcd</entry2>

<time>09:00:40</time>
<entry1>abcd</entry1>
<entry2>abcd</entry2>

You may or may not want to suppress that "<log>" in the "time" line.

殊姿 2024-10-09 07:00:49

对于其他仍在寻找 shell 脚本以在本地或远程查找日志文件中的特定字符串的人,我编写了以下 shell 脚本:

https://github.com/ijimako/logs_extractor

干杯,

For others still looking for a shell script to find specific string(s) in log(s) file(s) locally or remotely I have written this shell script:

https://github.com/ijimako/logs_extractor

Cheers,

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文