使用 Regex 解析 Bash 脚本中的 ClamAV 日志以插入 MySQL

发布于 2024-11-19 17:39:05 字数 2215 浏览 4 评论 0原文

早上/晚上，

我遇到了一个问题，我正在制作一个使用 ClamAV 扫描恶意软件的工作脚本，然后通过使用 grep 和 awk 获取生成的 ClamAV 日志将其结果放入 MySQL 中以转换正确的部分日志到变量。我遇到的问题是，虽然我已经完成了摘要，但检测的语法使其变得稍微困难一些。我无论如何都不是正则表达式方面的专家，这是一次学习经历，所以可能有比我更好的方法！

我试图解析的行看起来像这样：

/net/nas/vol0/home/recep/SG4rt.exe: Worm.SomeFool.P FOUND
/net/nas/vol0/home/recep/SG4rt.exe: moved to '/srv/clamav/quarantine/SG4rt.exe'

据我能够建立的，我需要一个积极的向后查找来匹配冒号之后和之前发生的事情，而不实际匹配冒号或其之后的空格，我可以从 RegExr 中看不到一种明确的方法，它不会认为我正在尝试寻找两个冒号。更糟糕的是，我们有时也会得到这些...

WARNING: Can't open file /net/nas/vol0/home/laser/samples/sample1.avi: Permission denied

最终结果是我可以构建一个 MySQL 查询，插入路径、发现的恶意软件以及它移动到的位置，或者如果存在错误，则插入路径，然后插入错误遇到这样的情况，以便将每个元素转换为 while 语句中的变量内容。

我已经完成了扫描摘要，如下所示

：摘要如下所示

----------- SCAN SUMMARY -----------
Known viruses: 329
Engine version: 0.97.1
Scanned directories: 17350
Scanned files: 50342
Infected files: 3
Total errors: 1
Data scanned: 15551.73 MB
Data read: 16382.67 MB (ratio 0.95:1)
Time: 3765.236 sec (62 m 45 s)

：解析如下：

SCANNED_DIRS=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Scanned directories" | awk '{gsub("Scanned directories: ", "");print}')
SCANNED_FILES=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Scanned files" | awk '{gsub("Scanned files: ", "");print}')
INFECTED=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Infected files" | awk '{gsub("Infected files: ", "");print}')
DATA_SCANNED=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Data scanned" | awk '{gsub("Data scanned: ", "");print}')
DATA_READ=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Data read" | awk '{gsub("Data read: ", "");print}')
TIME_TAKEN=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Time" | awk '{gsub("Time: ", "");print}')
END_TIME=$(date +%s)
mysql -u scanner_parser --password=removed sc_live -e "INSERT INTO bs.live.bs_jobstat VALUES (NULL, '$CURRTIME', '$PID', '$IY', '$SCANNED_DIRS', '$SCANNED_FILES', '$INFECTED', '$DATA_SCANNED', '$DATA_READ', '$TIME_TAKEN', '$END_TIME');"
rm -f /srv/clamav/$IY-scan-$LOGTIME.log

其中一些变量来自脚本的其他部分，可以忽略。我这样做的原因是为了避免日志文件混乱，并获得一个基于 Web 的系统状态简单概述。

有什么线索吗？我是否以错误的方式处理这一切？提前感谢您的帮助，我非常感激！

原文

Morning/Evening all,

I've got a problem where I'm making a script for work that uses ClamAV to scan for malware, and then place it's results in MySQL by taking the resultant ClamAV logs using grep with awk to convert the right parts of the log to a variable. The problem I have is that whilst I have done the summary ok, the syntax of detections makes it slightly more difficult. I'm no expert at regex by all means and this is a bit of a learning experience, so there is probably a far better way of doing it than I have!

The lines I'm trying to parse looks like these:

/net/nas/vol0/home/recep/SG4rt.exe: Worm.SomeFool.P FOUND
/net/nas/vol0/home/recep/SG4rt.exe: moved to '/srv/clamav/quarantine/SG4rt.exe'

As far as I was able to establish, I need a positive lookbehind to match what happens after and before the colon, without actually matching the colon or the space after it, and I can't see a clear way of doing it from RegExr without it thinking I'm trying to look for two colons. To make matters worse, we sometimes get these too...

WARNING: Can't open file /net/nas/vol0/home/laser/samples/sample1.avi: Permission denied

The end result is that I can build a MySQL query that inserts the path, malware found and where it was moved to or if there was an error then the path, then the error encountered so as to convert each element to a variable contents in a while statement.

I've done the scan summary as follows:

Summary looks like:

----------- SCAN SUMMARY -----------
Known viruses: 329
Engine version: 0.97.1
Scanned directories: 17350
Scanned files: 50342
Infected files: 3
Total errors: 1
Data scanned: 15551.73 MB
Data read: 16382.67 MB (ratio 0.95:1)
Time: 3765.236 sec (62 m 45 s)

Parsing like this:

SCANNED_DIRS=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Scanned directories" | awk '{gsub("Scanned directories: ", "");print}')
SCANNED_FILES=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Scanned files" | awk '{gsub("Scanned files: ", "");print}')
INFECTED=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Infected files" | awk '{gsub("Infected files: ", "");print}')
DATA_SCANNED=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Data scanned" | awk '{gsub("Data scanned: ", "");print}')
DATA_READ=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Data read" | awk '{gsub("Data read: ", "");print}')
TIME_TAKEN=$(cat /srv/clamav/$IY-scan-$LOGTIME.log | grep "Time" | awk '{gsub("Time: ", "");print}')
END_TIME=$(date +%s)
mysql -u scanner_parser --password=removed sc_live -e "INSERT INTO bs.live.bs_jobstat VALUES (NULL, '$CURRTIME', '$PID', '$IY', '$SCANNED_DIRS', '$SCANNED_FILES', '$INFECTED', '$DATA_SCANNED', '$DATA_READ', '$TIME_TAKEN', '$END_TIME');"
rm -f /srv/clamav/$IY-scan-$LOGTIME.log

Some of those variables are from other parts of the script and can be ignored. The reason I'm doing this is to save logfile clutter and have a simple web based overview of the status of the system.

Any clues? Am I going about all this the wrong way? Thanks for help in advance, I do appreciate it!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

拿命拼未来 2024-11-26 17:39:05

从我从问题中可以确定的情况来看，您似乎在问如何区分您想要的行与以 WARNING、ERROR、INFO 开头的记录器行。

您可以做到这一点，而无需考虑前瞻或后瞻。只需 grep 开头的行，

"/net/nas/vol0/home/recep/SG4rt.exe: "

然后使用 awk 即可提取该行的其余部分。或者您可以像在摘要处理部分中所做的那样 gsub 去掉前缀。

就处理摘要的问题而言，最让我印象深刻的是您多次处理整个文件，每次都拉出一种行。对于这样的任务，我将使用 Perl、Ruby 或 Python 并遍历文件，收集冒号后的每一行片段，将它们存储在常规编程语言变量（不是环境变量）中，并形成 MySQL 插入使用插值的字符串。

Bash 在某些方面非常有用，但恕我直言，您有理由使用更通用的脚本语言（例如 Perl、Python、Ruby）。

From what I can determine from the question, it seems like you are asking how to distinguish the lines you want from the logger lines that start with WARNING, ERROR, INFO.

You can do this without getting to fancy with lookahead or lookbehind. Just grep for lines beginning with

"/net/nas/vol0/home/recep/SG4rt.exe: "

then using awk you can extract the remainder of the line. Or you can gsub the prefix out like you are doing in the summary processing section.

As far as the question about processing the summary goes, what strikes me most is that you are processing the entire file multiple times, each time pulling out one kind of line. For tasks like this, I would use Perl, Ruby, or Python and make one pass through the file, collecting the pieces of each line after the colon, storing them in regular programming language variables (not env variables), and forming the MySQL insert string using interpolation.

Bash is great for some things but IMHO you are justified in using a more general scripting language (Perl, Python, Ruby come to mind).

回复收藏 0 原文

~没有更多了~