搜索巨大的日志文件

发布于 2024-09-29 15:39:22 字数 288 浏览 8 评论 0 原文

故障排除、分析和过滤日志文件是迄今为止最费力的日常工作之一。我的问题是搜索一个日志文件,该文件的大小可能远远超过 4 GB。仅加载文件最多需要 15 分钟。我正在运行一个相当快的处理器和 8 GB 内存。文件加载后,我实际上只能使用 grep 和/或 control+F 来扫描文件。当我试图从多个系统中查找文件(每个系统的重量都超过一个演出)时,情况会变得更糟。已尝试根据时间戳分隔文件以使其更小,但确实没有什么乐趣。

是否有一个工具甚至一个流程可以用来减少故障排除所花费的时间(除了通常的“先修复错误”之外)?

感谢您的评论。

Troubleshooting, analyzing & filtering log files is by far one of the most laborious daily jobs. My issue is searching through a log file, which could be well over 4 gigs in size. Simply loading the file takes up to 15 mins. I'm running a fairly fast processor with 8 gigs of memory. After the file loads, I literally only have the luxury of grep and/or control+F to scan through the file. This gets worse when I'm trying to look files from multiple systems each weighing over a gig. Have tried segregating the files based on time-stamps to make them smaller, but no joy really.

Is there a tool or even a process that I could use to make troubleshooting less time consuming (apart from the usual "just fix the bug first")?

Your comments are appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

万劫不复 2024-10-06 15:39:22

你在加载什么? 4 gigs 是一个相当大的文件,但加载到内存中应该不会花那么长时间。

对于那么大的文件,我建议直接使用 grep,如果 grep 不适合你,SED 和 AWK 是你的朋友。如果您想实时执行此操作,请了解如何将这些工具与管道和 tail -f 结合使用。

是的,我知道,SED 一开始非常令人生畏。它也非常强大。学习它。

如果您使用的是 Windows,我对您表示同情。我可以推荐一个unix shell吗?

如果您害怕命令行工具,请考虑学习 Perl 或 Python。它们都非常擅长从像这样的大文件中的噪声中区分信号。

What are you loading it with? 4 gigs is a fairly large file, but that shouldn't take THAT long to load into memory.

For files that large, I would recommend using grep directly, and if grep isn't doing it for you, SED and AWK are your friends. If you want to do it in realtime, learn about using those tools in conjunction with pipes and tail -f.

Yes, I know, SED is very intimidating at first. It's also ridiculously powerful. Learn it.

If you're on windows, you have my sympathy. May I recommend a unix shell?

If you are afraid of the command line tools, consider learning Perl or Python. They're both quite good at sorting signal from noise in large files like this.

最冷一天 2024-10-06 15:39:22

Baretail 是一个很好的工具。尝试一下。我还没有将它用于 4 gigs 文件,但我的日志文件也相当大,而且它工作得很好。 http://www.baremetalsoft.com/baretail/index.php

编辑:我做了没看到有人已经建议裸尾了。

Baretail is a good tool to have. Give it a try. I haven't used it for 4 gigs files but my log files are also quite big and it works just fine. http://www.baremetalsoft.com/baretail/index.php

edit: I did not see that someone has already suggested baretail.

悲欢浪云 2024-10-06 15:39:22

如果您想排除不想看到的内容,可以 grep -v 'I don't Want to see this' > logWithExcludedLines.log。您也可以使用正则表达式 grep -vE 'asdf|fdsa' > logWithNoASDForFDSA.log

此方法非常适用于 apache 访问日志 grep -v 'HTTP/1.1 200' > no200s.log (或类似的东西,不记得确切的字符串)。

If you want to exclude lines of things you don't want to see, you can grep -v 'I dont wanna see this' > logWithExcludedLines.log. You can use regex as well grep -vE 'asdf|fdsa' > logWithNoASDForFDSA.log

This method works very well with apache access logs grep -v 'HTTP/1.1 200' > no200s.log (or something like that, don't remember the exact string).

晨曦÷微暖 2024-10-06 15:39:22

我目前正在使用 unix 命令行工具 (f)grep、awk、cut、join 等执行此类操作,这些工具也可用于 cygwinUnxUtils 等等,还可以使用一些< a href="http://www.scala-lang.org/" rel="nofollow noreferrer">Scala 脚本用于处理更复杂的事情。您可以编写脚本来执行跨多个文件中的日志文件条目的搜索。但我也想知道是否有比这更好的东西 - 也许将它们导入数据库(都是SO问题)?

顺便说一句:将您的硬盘更换为 SSD 驱动器。这些速度快得多!另外,将 gzip 压缩的日志保留在磁盘上对我来说是值得的,因为在搜索它们时磁盘是瓶颈。例如,如果您要在日志文件中搜索正则表达式,并且希望每次出现的情况都有 100 行上下文,您可以这样做:

zcat *.log.gz | grep -100 '{regexp}' > {outputfile}

并将输出文件加载到您最喜欢的文本文件查看器中。如果您要搜索固定字符串,请使用 fgrep (与带有附加选项 -Fgrep 相同) - 速度要快得多。

I am currently doing such things using the unix command line tools (f)grep, awk, cut, join etc., which are available also for windows with cygwin or UnxUtils and so forth, and also use some Scala scripts for things that are more complicated. You can write scripts to do searches that span logfile entries in several files. But I am also wondering if there is something better than that - maybe importing them into a database (both being SO questions)?

By the way: have your harddisk replaced by a SSD drive. These are way faster! Also, it pays for me to leave the logs gzip-compressed on the disk, since when searching them the disk is the bottleneck. If you are searching for, say, a regular expression in the logfiles and want to have 100 lines of context for each occurrence, you'd do:

zcat *.log.gz | grep -100 '{regexp}' > {outputfile}

and load the outputfile into your favourite textfile viewer. If you are searching for fixed strings, use fgrep (same as grep with the additional option -F) - that's much faster.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文