是否有一个开源工具可以自动查找日志文件中的模式?
我多年来一直致力于集群系统,并决定是时候拥有一个工具来让我们轻松查询纯文本日志文件(以及其他内容)。我将所有日志文件下载到一台旧的测试机器上,压缩后的日志文件大约为 20 GB,但未压缩的日志文件为 550 GB(部分原因是许多堆栈跟踪)。我们有不同的人维护不同的“主题”,多年来我们的日志格式也发生了变化。但我们假设我可以以某种方式将其转换为跨所有主题的单一一致格式。
我的问题是:是否有一些免费/开源工具可以让我释放这些文件,并且它会自动识别重复出现的类似日志消息。作为示例消息:
User John Smith has logged in from IP aaa.bbb.ccc.ddd. Duration: zzz ms.
给定此类消息的许多实例,该工具将计算出如下模式:
User * has logged in from IP *. Duration: * ms.
其中 * 是变化数据的占位符。一旦我们有了这些模式(需要定期更新),我们就可以将每条新消息与这些模式进行匹配,并构建有用的统计数据。
理想情况下,该工具应该是 Java、Python 或 Perl,因为我们使用这些工具,并且我们处于混合的 Windows/Linux 环境中。
I've been working on a clustered system for many years, and decided it is time we had a tool that let us query the plain-text logfiles (among other things) easily. I downloaded all the logfiles to an old test machine, where they take about 20 GB compressed, but would take 550 GB uncompressed (partly due to many stack traces). We have different "topics" maintained by different people, and our log formats changed over the years. But let's just assume I could somehow turn it into a single consistent format across all topics.
My question is: Is there some free/open source tool that I can just let loose on those files, and it will automatically recognize recurring similar log messages. As an example message:
User John Smith has logged in from IP aaa.bbb.ccc.ddd. Duration: zzz ms.
Given many instances of such message, the tool would work out a pattern like:
User * has logged in from IP *. Duration: * ms.
Where * is a placeholder for varying data. Once we have those patterns (which would need to be updated regularly), we could match each new message to the patterns, and and build useful statistics.
Ideally the tool would be Java, or Python or Perl, as we use those, and we are in a mixed Windows/Linux environment.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这也可能是一个选项:Grok,Python 中的自动日志模式发现
This might also be an option: Grok, automatic log pattern discovery in Python