Regex、Grafana Loki、Promtail:使用正则表达式从日志中解析时间戳
我想从日志中解析时间戳以供 loki 作为时间戳。
当谈到正则表达式时,我是一个十足的菜鸟。
该日志文件来自“endlessh”,它本质上是 ssh 攻击者的 tarpit/honeypit。
它看起来像这样:
2022-04-03 14:37:25.101991388 2022-04-03T12:37:25.101Z CLOSE host=::ffff:218.92.0.192 port=21590 fd=4 time=20.015 bytes=26
2022-04-03 14:38:07.723962122 2022-04-03T12:38:07.723Z ACCEPT host=::ffff:218.92.0.192 port=64475 fd=4 n=1/4096
我想要使用正则表达式匹配的是那里存在的第二个时间戳,因为它是 utc 时间戳并且应该可以由 promtail 解析。
我尝试过不同的方法,但根本无法做到正确。
所以首先我需要一个与我想要的时间戳匹配的正则表达式。
但其次,我需要以某种方式将其形成一个正则表达式,以某种形式公开该值? 文档提供了这个示例:
.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)
Afaik,这些是命名组,这就是公开我在配置中使用它的值所需的全部内容?
如果有人可以提供正则表达式的解决方案并解释它的作用,那就太好了:)
I want to parse a timestamp from logs to be used by loki as the timestamp.
Im a total noob when it comes to regex.
The log file is from "endlessh" which is essentially a tarpit/honeypit for ssh attackers.
It looks like this:
2022-04-03 14:37:25.101991388 2022-04-03T12:37:25.101Z CLOSE host=::ffff:218.92.0.192 port=21590 fd=4 time=20.015 bytes=26
2022-04-03 14:38:07.723962122 2022-04-03T12:38:07.723Z ACCEPT host=::ffff:218.92.0.192 port=64475 fd=4 n=1/4096
What I want to match, using regex, is the second timestamp present there, since its a utc timestamp and should be parseable by promtail.
I've tried different approaches, but just couldn't get it right at all.
So first of all I need a regex that matches the timestamp I want.
But secondly, I somehow need to form it into a regex that exposes the value in some sort?
The docs offer this example:
.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)
Afaik, those are named groups, and that is all that it takes to expose the value for me to use it in the config?
Would be nice if someone can provide a solution for the regex, and an explanation of what it does :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
例如,您可以创建一个特定模式以匹配第一部分,并捕获第二部分:
regex demo
或使用非常宽的格式,如果格式始终相同,重复精确数量的非空格字符零件并捕获要保留的部分。
You could for example create a specific pattern to match the first part, and capture the second part:
Regex demo
Or use a very broad if the format is always the same, repeating an exact number of non whitespace characters parts and capture the part that you want to keep.
Regex demo