Regex、Grafana Loki、Promtail：使用正则表达式从日志中解析时间戳

发布于 2025-01-18 16:27:08 字数 794 浏览 4 评论 0原文

我想从日志中解析时间戳以供 loki 作为时间戳。
当谈到正则表达式时，我是一个十足的菜鸟。

该日志文件来自“endlessh”，它本质上是 ssh 攻击者的 tarpit/honeypit。

它看起来像这样：

2022-04-03 14:37:25.101991388  2022-04-03T12:37:25.101Z CLOSE host=::ffff:218.92.0.192 port=21590 fd=4 time=20.015 bytes=26
2022-04-03 14:38:07.723962122  2022-04-03T12:38:07.723Z ACCEPT host=::ffff:218.92.0.192 port=64475 fd=4 n=1/4096

我想要使用正则表达式匹配的是那里存在的第二个时间戳，因为它是 utc 时间戳并且应该可以由 promtail 解析。

我尝试过不同的方法，但根本无法做到正确。

所以首先我需要一个与我想要的时间戳匹配的正则表达式。
但其次，我需要以某种方式将其形成一个正则表达式，以某种形式公开该值？文档提供了这个示例：

.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)

Afaik，这些是命名组，这就是公开我在配置中使用它的值所需的全部内容？

如果有人可以提供正则表达式的解决方案并解释它的作用，那就太好了:)

原文

I want to parse a timestamp from logs to be used by loki as the timestamp.
Im a total noob when it comes to regex.

The log file is from "endlessh" which is essentially a tarpit/honeypit for ssh attackers.

It looks like this:

2022-04-03 14:37:25.101991388  2022-04-03T12:37:25.101Z CLOSE host=::ffff:218.92.0.192 port=21590 fd=4 time=20.015 bytes=26
2022-04-03 14:38:07.723962122  2022-04-03T12:38:07.723Z ACCEPT host=::ffff:218.92.0.192 port=64475 fd=4 n=1/4096

What I want to match, using regex, is the second timestamp present there, since its a utc timestamp and should be parseable by promtail.

I've tried different approaches, but just couldn't get it right at all.

So first of all I need a regex that matches the timestamp I want.
But secondly, I somehow need to form it into a regex that exposes the value in some sort?
The docs offer this example:

.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)

Afaik, those are named groups, and that is all that it takes to expose the value for me to use it in the config?

Would be nice if someone can provide a solution for the regex, and an explanation of what it does :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

屌丝范 2025-01-25 16:27:08

例如，您可以创建一个特定模式以匹配第一部分，并捕获第二部分：

^\d{4}-\d{2}-\d{2} \d\d:\d\d:\d\d\.\d+\s+(?P<timestamp>\d{4}-\d{2}-\d{2}T\d\d:\d\d:\d\d\.\d+Z)\b

regex demo

或使用非常宽的格式，如果格式始终相同，重复精确数量的非空格字符零件并捕获要保留的部分。

^(?:\S+\s+){2}(?<timestamp>\S+)

You could for example create a specific pattern to match the first part, and capture the second part:

^\d{4}-\d{2}-\d{2} \d\d:\d\d:\d\d\.\d+\s+(?P<timestamp>\d{4}-\d{2}-\d{2}T\d\d:\d\d:\d\d\.\d+Z)\b

Regex demo

Or use a very broad if the format is always the same, repeating an exact number of non whitespace characters parts and capture the part that you want to keep.