Regex、Grafana Loki、Promtail:使用正则表达式从日志中解析时间戳

发布于 2025-01-18 16:27:08 字数 794 浏览 4 评论 0原文

我想从日志中解析时间戳以供 loki 作为时间戳。
当谈到正则表达式时,我是一个十足的菜鸟。

该日志文件来自“endlessh”,它本质上是 ssh 攻击者的 tarpit/honeypit。

它看起来像这样:

2022-04-03 14:37:25.101991388  2022-04-03T12:37:25.101Z CLOSE host=::ffff:218.92.0.192 port=21590 fd=4 time=20.015 bytes=26
2022-04-03 14:38:07.723962122  2022-04-03T12:38:07.723Z ACCEPT host=::ffff:218.92.0.192 port=64475 fd=4 n=1/4096

我想要使用正则表达式匹配的是那里存在的第二个时间戳,因为它是 utc 时间戳并且应该可以由 promtail 解析。

我尝试过不同的方法,但根本无法做到正确。

所以首先我需要一个与我想要的时间戳匹配的正则表达式。
但其次,我需要以某种方式将其形成一个正则表达式,以某种形式公开该值? 文档提供了这个示例:

.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)

Afaik,这些是命名组,这就是公开我在配置中使用它的值所需的全部内容?

如果有人可以提供正则表达式的解决方案并解释它的作用,那就太好了:)

I want to parse a timestamp from logs to be used by loki as the timestamp.
Im a total noob when it comes to regex.

The log file is from "endlessh" which is essentially a tarpit/honeypit for ssh attackers.

It looks like this:

2022-04-03 14:37:25.101991388  2022-04-03T12:37:25.101Z CLOSE host=::ffff:218.92.0.192 port=21590 fd=4 time=20.015 bytes=26
2022-04-03 14:38:07.723962122  2022-04-03T12:38:07.723Z ACCEPT host=::ffff:218.92.0.192 port=64475 fd=4 n=1/4096

What I want to match, using regex, is the second timestamp present there, since its a utc timestamp and should be parseable by promtail.

I've tried different approaches, but just couldn't get it right at all.

So first of all I need a regex that matches the timestamp I want.
But secondly, I somehow need to form it into a regex that exposes the value in some sort?
The docs offer this example:

.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)

Afaik, those are named groups, and that is all that it takes to expose the value for me to use it in the config?

Would be nice if someone can provide a solution for the regex, and an explanation of what it does :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

屌丝范 2025-01-25 16:27:08

例如,您可以创建一个特定模式以匹配第一部分,并捕获第二部分:

^\d{4}-\d{2}-\d{2} \d\d:\d\d:\d\d\.\d+\s+(?P<timestamp>\d{4}-\d{2}-\d{2}T\d\d:\d\d:\d\d\.\d+Z)\b

regex demo

或使用非常宽的格式,如果格式始终相同,重复精确数量的非空格字符零件并捕获要保留的部分。

^(?:\S+\s+){2}(?<timestamp>\S+)

You could for example create a specific pattern to match the first part, and capture the second part:

^\d{4}-\d{2}-\d{2} \d\d:\d\d:\d\d\.\d+\s+(?P<timestamp>\d{4}-\d{2}-\d{2}T\d\d:\d\d:\d\d\.\d+Z)\b

Regex demo

Or use a very broad if the format is always the same, repeating an exact number of non whitespace characters parts and capture the part that you want to keep.

^(?:\S+\s+){2}(?<timestamp>\S+)

Regex demo

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文