有什么方法可以浏览大型日志文件吗？

发布于 2024-09-16 17:44:56 字数 894 浏览 4 评论 0原文

// Java 程序员，当我说方法时，我的意思是“做事的方式”...

大家好，

我正在编写一个日志挖掘器脚本来监视我公司的各种日志文件，它是用 Perl 编写的，尽管我有权访问Python，如果我真的需要的话，C（尽管我的公司不喜欢二进制文件）。它需要能够遍历过去 24 小时，获取日志代码并检查我们是否应该忽略或向适当的人（我）发送电子邮件。该脚本将作为 Solaris 服务器上的 cron 作业运行。现在这就是我的想法（这只是伪的......而且写得很糟糕）

main()
{
    $today = Get_Current_Date();
    $yesterday = Subtract_One_Day($today);
    `grep $yesterday '/path/to/log' > /tmp/log`    # Get logs from previous day
    `awk '{print $X}' > /tmp/log_codes`;           # Get Log Code
    SubRoutine_to_Compare_Log_Codes('/tmp/log_codes');
}

另一个想法是将日志文件加载到内存中并在那里读取它......这一切都很好，除了一个两个小问题。

这些服务器是生产服务器，为数百万客户提供服务...
日志文件平均为 3.3GB（大约两天的日志），

因此 grep 不仅需要一段时间来检查每个文件，而且会耗尽 CPU 和进程中的内存需要在其他地方使用。将 3.3GB 文件加载到内存中并不是最明智的想法。（至少恕我直言）。现在我有一个疯狂的想法，涉及汇编代码和内存位置，但我不知道 SPARC 汇编是否可以刷新这个想法。

有人有什么建议吗？

感谢您阅读本文=）

原文

// Java programmers, when I mean method, I mean a 'way to do things'...

Hello All,

I'm writing a log miner script to monitor various log files at my company, It's written in Perl though I have access to Python and if I REALLY need to, C (though my company doesn't like binary files). It needs to be able to go through the last 24 hours, take the log code and check it if we should ignore or email the appropriate people (me). The script would run as a cron job on Solaris servers. Now here is what I had in mind (this is only pseudo-ish... and badly written pesudo)

main()
{
    $today = Get_Current_Date();
    $yesterday = Subtract_One_Day($today);
    `grep $yesterday '/path/to/log' > /tmp/log`    # Get logs from previous day
    `awk '{print $X}' > /tmp/log_codes`;           # Get Log Code
    SubRoutine_to_Compare_Log_Codes('/tmp/log_codes');
}

Another thought was to load the log file into memory and read it in there... that is all fine and dandy except for a two small problems.

These servers are production servers and serve a couple million customers...
The Log files average 3.3GB (which are logs for about two days)

So not only would grep take a while to go through each file, but It would use up CPU and Memory in the process which need to be used elsewhere. And loading into memory a 3.3GB file is not of the wisest ideas. (At least IMHO). Now I had a crazy idea involving assembly code and memory locations but I don't know SPARC assembly sooo flush that idea.

Anyone have any suggestions?

Thanks for reading this far =)

分享到QQ

分享到微博