有什么方法可以浏览大型日志文件吗?
// Java 程序员,当我说方法时,我的意思是“做事的方式”...
大家好,
我正在编写一个日志挖掘器脚本来监视我公司的各种日志文件,它是用 Perl 编写的,尽管我有权访问Python,如果我真的需要的话,C(尽管我的公司不喜欢二进制文件)。它需要能够遍历过去 24 小时,获取日志代码并检查我们是否应该忽略或向适当的人(我)发送电子邮件。该脚本将作为 Solaris 服务器上的 cron 作业运行。现在这就是我的想法(这只是伪的......而且写得很糟糕)
main()
{
$today = Get_Current_Date();
$yesterday = Subtract_One_Day($today);
`grep $yesterday '/path/to/log' > /tmp/log` # Get logs from previous day
`awk '{print $X}' > /tmp/log_codes`; # Get Log Code
SubRoutine_to_Compare_Log_Codes('/tmp/log_codes');
}
另一个想法是将日志文件加载到内存中并在那里读取它......这一切都很好,除了一个两个小问题。
- 这些服务器是生产服务器,为数百万客户提供服务...
- 日志文件平均为 3.3GB(大约两天的日志),
因此 grep 不仅需要一段时间来检查每个文件,而且会耗尽 CPU 和进程中的内存需要在其他地方使用。将 3.3GB 文件加载到内存中并不是最明智的想法。 (至少恕我直言)。现在我有一个疯狂的想法,涉及汇编代码和内存位置,但我不知道 SPARC 汇编是否可以刷新这个想法。
有人有什么建议吗?
感谢您阅读本文=)
// Java programmers, when I mean method, I mean a 'way to do things'...
Hello All,
I'm writing a log miner script to monitor various log files at my company, It's written in Perl though I have access to Python and if I REALLY need to, C (though my company doesn't like binary files). It needs to be able to go through the last 24 hours, take the log code and check it if we should ignore or email the appropriate people (me). The script would run as a cron job on Solaris servers. Now here is what I had in mind (this is only pseudo-ish... and badly written pesudo)
main()
{
$today = Get_Current_Date();
$yesterday = Subtract_One_Day($today);
`grep $yesterday '/path/to/log' > /tmp/log` # Get logs from previous day
`awk '{print $X}' > /tmp/log_codes`; # Get Log Code
SubRoutine_to_Compare_Log_Codes('/tmp/log_codes');
}
Another thought was to load the log file into memory and read it in there... that is all fine and dandy except for a two small problems.
- These servers are production servers and serve a couple million customers...
- The Log files average 3.3GB (which are logs for about two days)
So not only would grep take a while to go through each file, but It would use up CPU and Memory in the process which need to be used elsewhere. And loading into memory a 3.3GB file is not of the wisest ideas. (At least IMHO). Now I had a crazy idea involving assembly code and memory locations but I don't know SPARC assembly sooo flush that idea.
Anyone have any suggestions?
Thanks for reading this far =)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
可能的解决方案: 1)让系统在每个午夜启动一个新的日志文件——这样你就可以以降低的优先级挖掘前一天的有限大小的日志文件; 2) 修改日志系统,使其自动提取某些消息以进行进一步处理。
Possible solutions: 1) have the system start a new log file every midnight -- this way you could mine the finite-size log file of the previous day at a reduced priority; and 2) modify the logging system so that it automatically extracts certain messages for further processing on the fly.