是否有一个开源工具可以自动查找日志文件中的模式?

发布于 2024-11-06 15:59:02 字数 618 浏览 0 评论 0原文

我多年来一直致力于集群系统,并决定是时候拥有一个工具来让我们轻松查询纯文本日志文件(以及其他内容)。我将所有日志文件下载到一台旧的测试机器上,压缩后的日志文件大约为 20 GB,但未压缩的日志文件为 550 GB(部分原因是许多堆栈跟踪)。我们有不同的人维护不同的“主题”,多年来我们的日志格式也发生了变化。但我们假设我可以以某种方式将其转换为跨所有主题的单一一致格式。

我的问题是:是否有一些免费/开源工具可以让我释放这些文件,并且它会自动识别重复出现的类似日志消息。作为示例消息:

User John Smith has logged in from IP aaa.bbb.ccc.ddd. Duration: zzz ms.

给定此类消息的许多实例,该工具将计算出如下模式:

User * has logged in from IP *. Duration: * ms.

其中 * 是变化数据的占位符。一旦我们有了这些模式(需要定期更新),我们就可以将每条新消息与这些模式进行匹配,并构建有用的统计数据。

理想情况下,该工具应该是 Java、Python 或 Perl,因为我们使用这些工具,并且我们处于混合的 Windows/Linux 环境中。

I've been working on a clustered system for many years, and decided it is time we had a tool that let us query the plain-text logfiles (among other things) easily. I downloaded all the logfiles to an old test machine, where they take about 20 GB compressed, but would take 550 GB uncompressed (partly due to many stack traces). We have different "topics" maintained by different people, and our log formats changed over the years. But let's just assume I could somehow turn it into a single consistent format across all topics.

My question is: Is there some free/open source tool that I can just let loose on those files, and it will automatically recognize recurring similar log messages. As an example message:

User John Smith has logged in from IP aaa.bbb.ccc.ddd. Duration: zzz ms.

Given many instances of such message, the tool would work out a pattern like:

User * has logged in from IP *. Duration: * ms.

Where * is a placeholder for varying data. Once we have those patterns (which would need to be updated regularly), we could match each new message to the patterns, and and build useful statistics.

Ideally the tool would be Java, or Python or Perl, as we use those, and we are in a mixed Windows/Linux environment.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文